We used as input to our neural network audio features for the songs on each of the first 100 playlists of the Million playlist data base. Theses feature were downloaded for each song from the Spotify API using a bash script (m_rsa.sh) which yielded json files. A python script (Convert_Spot_Audio_Features_json_to_df) was then used to extract from these files the relevant features and to create feature dataframes for each playlist. The following audio features were acquired and used in subsequent modeling.
The features in each dataframe are as follows.
We were particularly interested in creating a recommender system that would perform well on newly released songs not in the Million Playlist Database. To do this we first needed to acquire from the Spotiffy API audio-feature information specifically for newly released songs. To do this required several steps involving bash and python scripts. They are listed below.
Spotify has an API which yields names and some information on newly released albums. It will only yield information from 50 albums at a time and only 100 albums in all.
We were also interested in having our recommender system perform well on less popular songs for which FM tags are unlikely to be available. A list of unpopular songs and their audio features was generated for this purpose as follows.
We retrieved the tags and weights of the tags associated with each song from Last.fm. Since there are 522366 unique tags in all on Last.fm, and there are a large number tags associated with only a few songs, we decided to truncate the length of the tags so that the included tags as labels in our model all have reasonably high occurrences among the songs. The tags and weights of tags for a playlist is retrieved by averaging the weights of tags of all the songs in the playlist.
Python Scripts