Bigdata Project

Music Recommendation System

Dataset and data models:

The Million Song Dataset(http://millionsongdataset.com/) will be used in this project. It consists of two dataset, 1 million songs and 1 million users. The dataset containing 55 features which we are going to extract some of these features and create our data models as follows:

Songs(1M): <Track_id, title, song_id, release, artist_id, artist_name, artist_familiarity, artist_hotttnesss , year>
Artists(44745): <artist_id>
ArtistsSimilarity(2201916): <artist_id, artist_id>
ArtistMBtags(24777): <artist_id, MBtags>
MusicBrainzTags(2321): <MBtags>
ArtistGenre(1109381): <artist_id, genre>
Users(1M): <user_id, song_id, played_count>

The song dataset is accessible in two ways, by reading HDF5 hierarchical directories and SQLite database. The aforementioned data models can be created by using pandas and sqlite3 libraries with connecting to SQLite database. The models will be converted from pandas dataframe to pyspark datafram and data extrapolation and data pruning will be handled by pyspark library.

Algorithms:

Content-Based filtering
Collaborative filtering(user-user)

The two algorithms we will use and compare are the Content-Based Filtering and the Collaborative Filtering. We will use the Content-Based Filtering to recommend music based on a user’s profile. For instance, we will recommend music with the same artist, album, genre and so on, such that we recommend music that has similar content. We will also use the Collaborative Filtering to recommend music based on similar users(user-user). For instance, recommending music from a user’s music history. Both of these algorithms will allow us to compare and analyze measure of success. (If the time allows we will implement Latent-Factor algorithm)

Research Questions:

In Content-based techniques:

What genre, song, artist or album would a user prefer to be recommended based on user profile?
How to find an appropriate feature to be considered in recommendation?

In collaborative techniques:

How can we use the data set to identify users with similar musical preferences?
How can features be linked with patterns in users' listening behaviour? (How similar users will be found)
How can we personalise next-track music recommendations by extracting long-term preference signals from users' long-term listening behaviour, such as favourite songs or favourite artists?
How to handle the cold-start issue?

Metrics to measure the accuracy:

Prediction accuracy is generally independent of the user interface and can be estimated offline by implementing different supervised machine learning evaluation metrics. In this part, the different algorithms could be compared. The metrics give insight into how relevant the list of recommended items is. K is chosen as the top number of recommended documents.

Precision(P@k): the percentage of relevant documents among the top-k documents.
Recall@k : the percentage of the relevant documents that are succesfully recommended.
Average precision(Ap@K): the mean of AP@i for i = 1, ..., K.
Mean average precision(MAP@K): the mean of Ap@K where i = 1, ..., K.
Personalization (average cosine similarity): gives the dissimilarity between users' lists of recommendations

Libraries:

pyspark
pandas
numPy
sqlite3
sklearn
matplotlib
seaborn
h5py
pytables

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
README.md		README.md
music-recommendation-system.ipynb		music-recommendation-system.ipynb
music-recommendation-system.pdf		music-recommendation-system.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bigdata Project

Music Recommendation System

Dataset and data models:

Algorithms:

Research Questions:

Metrics to measure the accuracy:

Libraries:

About

Releases

Packages

Contributors 4

Languages

reza-nikzad/bigdata

Folders and files

Latest commit

History

Repository files navigation

Bigdata Project

Music Recommendation System

Dataset and data models:

Algorithms:

Research Questions:

Metrics to measure the accuracy:

Libraries:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages