Getting Data From Spotify

Acquiring and Processing Audio Feature Data From the Spotify API for the First 100 Playlists of the Million playlist Database

We used as input to our neural network audio features for the songs on each of the first 100 playlists of the Million playlist data base. Theses feature were downloaded for each song from the Spotify API using a bash script (m_rsa.sh) which yielded json files. A python script (Convert_Spot_Audio_Features_json_to_df) was then used to extract from these files the relevant features and to create feature dataframes for each playlist. The following audio features were acquired and used in subsequent modeling.

The features in each dataframe are as follows.

track name — the name of the track
track id — the Spotify ID for the track.
acousticness — 0.0 to 1.0, a confidence measure of whether the track is acoustic.
danceability — 0.0 to 1.0, a measure off how suitable a track is for dancing.
speechiness — 0.0 to 1.0, the presence of spoken words in a track. 0.33 and 0.66 describe tracks that may contain both music and speech such as rap music.
instrumentalness — 0.0 to 1.0, higher values indicate less vocal content.
liveness — 0.0 to 1.0, the presence of an audience in the recording. The higher the value the the more likely.
energy — 0.0 to 1.0, a perceptual measure of intensity and activity : i.e. fast, loud, and noisy.
valence — 0.0 to 1.0, tracks with high valence sound more positive (e.g. happy, cheerful, euphoric)
duration_ms — The duration of the track in milliseconds.
key — musical key, integers that map to pitches using standard pitch class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on.
mode — musical modality (major or minor) of a track, Major is represented by 1 and minor is 0.
loudness — the overall loudness of a track in decibels (dB).
tempo — beats per minute
time_signature — 1,2,3,4, number of bars per measure

Acquiring and Processing New-Release Data From the Spotify API

We were particularly interested in creating a recommender system that would perform well on newly released songs not in the Million Playlist Database. To do this we first needed to acquire from the Spotiffy API audio-feature information specifically for newly released songs. To do this required several steps involving bash and python scripts. They are listed below.

Spotify has an API which yields names and some information on newly released albums. It will only yield information from 50 albums at a time and only 100 albums in all.

A bash script (nr.sh) was run to get json files for each of these albums.
A python script (New_Release_Albums_to_Dataframe) was run to extract the album ids from these json files.
A list of these album ids was generated.
A bash script (gt.sh) was run to get json files containing track ids for each. album.
A python script (Album_info_to_track_ids_list) was run to extract from these json files the track ids for each album and a list of these track ids was made.
A bash script (m_rsa.sh) was run to get json files of the audio features for each track.
A python script (Convert_Spot_Audio_features_new_release_to_df’) was run to convert the audio features in the the json files to dataframes, one for each album, and write them to csv files.

Acquiring Audio Features for Unpopular Songs

We were also interested in having our recommender system perform well on less popular songs for which FM tags are unlikely to be available. A list of unpopular songs and their audio features was generated for this purpose as follows.

A bash script (pop.sh) was used to acquire track info including popularity for each song on each playlist in the first 100 playlist of the Million Playlist Database.
A python script (Converting_json_track_files_to_unpopular_df) was then used to extract create a dataframe of track info for all songs with a popularity score of 10 or lower.
Another python script (Creating_Dataframe_of_Unpopular_Songs) was used to extract from the audio-feture dataframes generated earlier the features for just the unpopular songs, and a new dataframe was generated with these songs and features.

Acquiring and Processing Data of Tags as Labels From the Last.fm API

We retrieved the tags and weights of the tags associated with each song from Last.fm. Since there are 522366 unique tags in all on Last.fm, and there are a large number tags associated with only a few songs, we decided to truncate the length of the tags so that the included tags as labels in our model all have reasonably high occurrences among the songs. The tags and weights of tags for a playlist is retrieved by averaging the weights of tags of all the songs in the playlist.

Code: Links to Scripts Used

Bash Scripts

github.com/weiru-chen-15801/spotify_final_project/blob/Dan/Bash%20Scripts/Get%20Audio%20Features/m_rsa.sh

https://github.com/weiru-chen-15801/spotify_final_project/blob/Dan/Bash%20Scripts/Get%20Audio%20Features/rsa.sh

https://github.com/weiru-chen-15801/spotify_final_project/blob/Dan/Bash%20Scripts/Get%20new%20releases/gt.sh

github.com/weiru-chen-15801/spotify_final_project/blob/Dan/Bash%20Scripts/Get%20new%20releases/nr.sh

Python Scripts

https://github.com/weiru-chen-15801/spotify_final_project/blob/Dan/Python%20Scripts/Before_Download_Spotify_Audio_Features.ipynb

https://github.com/weiru-chen-15801/spotify_final_project/blob/Dan/Python%20Scripts/Convert_Spot_Audio_Features_json_to_df.ipynb

https://github.com/weiru-chen-15801/spotify_final_project/blob/Dan/Python%20Scripts/Album_info_to_track_ids_list.ipynb

https://github.com/weiru-chen-15801/spotify_final_project/blob/Dan/Python%20Scripts/New_Release_Albums_to_Dataframe.ipynb

https://github.com/weiru-chen-15801/spotify_final_project/blob/Dan/Python%20Scripts/Convert_Spot_Audio_features_new_release_to_df.ipynb

https://github.com/weiru-chen-15801/spotify_final_project/blob/Dan/Python%20Scripts/Creating_Dataframe_of_Unpopular_Songs.ipynb

Page updated

Google Sites

Report abuse