GitHub - inrsta/spotify-prediction-api

Predicting a song's popularity on Spotify.

The dataset used for this project is located here. The idea behind this project was to create a REST API which can predict how successfull a song will be on Spotify if we provide some techincal details about the song. Such technical details include genre, danceability, energy etc.

The api endpoint is located here: https://spotifypredictapi-dbtolisexq-ew.a.run.app/predict. In order to test it please send a request to this endpoint structured like this:

song_features = {
    "danceability": 0.5,
    "energy": 0.5,
    "loudness": 0.5,
    "speechiness": 0.5,
    "acousticness": 0.5,
    "duration_minutes": 3,
    "valence": 0.5,
    "tempo": 0.5,
    "liveness": 0.5,
    "genre": 1,
}

The endpoint is live so you can try it and it should work. There is a python file rest_api/test_google_run_actions.py which you can run to send the request as well.

I have put the dataset in a GCS bucket. In order for you to do the same you have to follow these steps:

Authentication

Make sure you setup gcloud on your machine by running: gcloud init

Downloading the data and storing it in a Google Storage Bucket (GCS).

Make sure to have a python environment activated and install kaggle using pip: pip install kaggle. We will need kaggle to download the dataset programmatically.
- To download datasets from kaggle programmatically, we need a kaggle API, which you can get from the settings page on kaggle.
- Create a .kaggle folder in your machine's main directory: mkdir ~/.kaggle
- Download the json file from kaggle and put it in the newly created .kaggle folder.
- Change the privacy of the file: chmod 600 ~/.kaggle/kaggle.json
From the directory download_data run

make load_to_bucket dataset=amitanshjoshi/spotify-1million-tracks bucket=spotify-popularity-prediction-inri

This script will download the dataset and upload it to the bucket. Please make sure the account you are authenticated as has the rights to access and write to the bucket.

Note: The gcs bucket has been created using the ui because it will only be created once.

MLFlow server

Please cd into the mlflow_server folder. There you will find the terraform main.tf and variables.tf files. Please run the following commands in order to set up a Google Cloud Platform (GCP) Virtual Machine (VM), where MLFlow will be running.

terraform init
terraform plan
terraform apply

The output of terraform will be the external IP of the newly created VM. To login to the MLFlow server which is running in the newly created VM just go inside a browser and type inside the URL box:

<VM-EXTERNAL-IP>:5000 #for example http://23.37.165.44:5000

For authenticating, please use admin for the username and password for the password.

Prefect

Inside the directory prefect_deployment there is a file called orchestrate_gcs.py. Inside this file you can find the code for the Prefect flow which will train a model and track it with MLFlow. You can either run it via python orceshtrate_gcs.py when you are inside the directory or you can also deploy a Prefect flow using Prefect Cloud.

The flow is tested with unit tests.

The model is also deployed as a registered model in MLFlow.

The flow is deployed to Prefect Cloud for better automation.

Rest API and Google Cloud Run

Inisde the directory rest_api you can find the code to deploy a REST API. To run the code you need to set two environment variables:

BUCKET_NAME which is the name of the bucket where the MLFlow artifacts are stored
RUN_ID the run id of the best performing model

There is a Pipfile available which you can use to install the dependencies.

The code is deployed on Google Cloud Run through a CI/CD pipeline which runs automatically everytime new code is pushed to the main branch of the repository.

There are also pre-commit hooks.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
data_exploration		data_exploration
download_data		download_data
images		images
mlflow-server		mlflow-server
prefect_deployment		prefect_deployment
rest_api		rest_api
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting a song's popularity on Spotify.

Authentication

Downloading the data and storing it in a Google Storage Bucket (GCS).

MLFlow server

Prefect

Rest API and Google Cloud Run

About

Releases

Packages

Languages

inrsta/spotify-prediction-api

Folders and files

Latest commit

History

Repository files navigation

Predicting a song's popularity on Spotify.

Authentication

Downloading the data and storing it in a Google Storage Bucket (GCS).

MLFlow server

Prefect

Rest API and Google Cloud Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages