Project Trajectory

Background:

To do song recommendations, we considered two approaches. The first approach is to use acoustic features of a given playlist to recommend songs, and the second approach is to use the moods detected in the lyrics to divide songs into mood categories, and recommend songs based on moods. We implemented both approaches in our recommender system.

Motivation and Goal:

As we are investigating song recommendation approaches, we realized that while some people are more attracted to acoustic features like melody and tempo, others pay more attention to lyrics and texts. So besides the original model that we came up with, which is to recommend songs based on the acoustic features, we decided to add another algorithm which detects the moods of the lyrics, and recommend songs to users based on their preferred moods that they selected. The goal is to benefit not only people who are into acoustic features, but also people who attach importance to lyrics.

Description:

With acoustic features, we build a model that can learn the tags (which describe the attributes of songs) of a playlist given acoustic features introduced in the model section. And with those learned tags, the recommender system will recommend songs based on a KNN model.
With lyrics, we used techniques in NLP to detect the moods of the songs (solely based on lyrics), and the recommend songs in the mood that the user selects.

To view more details about the models and the explanations, please click on the button below to check out the "Approach" page.

Model Approach

Changes Along the Way:

As we investigate more on the recommendation algorithms, we added the second recommender system that detects moods in lyrics and recommend songs based on the moods of the lyrics.
For the first approach, we initially decided to train a CNN model (our baseline model) to learn the tags from the acoustic features. But later, we found out that an ANN has a much higher performance. So we decided to switch to an ANN model.
With the KNN model that we utilized to make predictions, we tried different kinds of distances, like Manhattan Distance, and we settled on the Euclidean distance. Because it makes the most sense, and it has the best performance.
For the second approach, we tried both TFIDF and Word2Vec for word representation, and decided to use Word2Vec,
When training a model to detect the mood from lyrics, we initially used logistic regression, and we finally decided to use a random forest.

For more details, please click on the button below to check out the "Approach" page.

Model Approach

Interpretation:

When designing a system to make song recommendations in real time, two important considerations are the need to make recommendations quickly, that is in real time, and the need to make high-quality recommendations that the user will like. These two factors are naturally at odds, as the first precludes the use of on-the-fly large database searches which otherwise might produce the best recommendations. Indeed, we have found that using a kNN classifier with tag attributes that must be extracted in real time from a very large database produces high quality recommendations, but it is to slow to work in real time. An important discovery we made, however, is that a two-stage approach that employs first a neural network and then a kNN classifier balances these considerations very well. In particular we have found that from a limited number of readily available Spotify audio features a large number of song-character tags can be very accurately predicted. Our ANN can predict song tags from audio feature with an accuracy above 94%. And once trained it can do so very quickly. Further, we found that to our ears a kNN classifier employed on the ANN-generated tags can produce high quality recommendations. For example, on the ‘Results Acoustic page’ on the left is an input playlist, and on the right are a set of recommendations. If you play the audio clips from both sides, I think you will find that the songs from each side do sound similar and are likely to go well together on a single playlist. Thus, we think our two-staged approach may be something others might want to emulate going forward when building song recommenders, particularly when making recommendations from newly released or unpopular songs which have yet to be assigned tags in any database.

Another virtue of our system is that the input to our network is an averaged set of audio features, averaged over the songs on the input playlist. This means that if there is only a single song on the playlist, the system can use its features and make a reasonable recommendation. But then when more songs are added , as the songs' features are averaged, the more the recommended songs will reflect the character of the whole input playlist. Thus, our system is good for both ‘cold-start’ and large-playlist-based recommending.

Finally, we realize that it would be good to have a quantitative measure of the performance of our recommender system. This is of course difficult given the qualitative nature of music appreciation. But if we had time going forward, we would take the following approach. We would examine how frequently our input playlists' songs and the subsequent recommended songs appear together on the playlists in the Million Playlist Dataset. (We would of course normalize for each songs frequency by itself). And we would compare this co-appearing frequency to that of songs randomly selected from the same dataset. We would expect the input and recommended songs would appear together more frequently than the randomly selected songs in a way that we could quantify. Indeed, we tried to implement this approach but found with our limited time and computing power we could not complete this analysis.

Page updated

Google Sites

Report abuse