We originally thought that a method that would work to pick out relevant songs is matrix factorization using a sparse [playlist x song] matrix to learn what songs co-occur in the playlists. This method uses other similar playlists to inform what might be relevant for the current playlist and is a common tool in many other recommendation algorithms. As a baseline model, we used a simple popularity based model. We used tutorials on creating sparse matrices and performing matrix factorization from Ethan Rosenthal (https://www.ethanrosenthal.com/2016/10/19/implicit-mf-part-1/ ) and Jesse Steinweg-Woods (https://jessesw.com/Rec-System/ ).
In all cases of making our models, our train and test split was done differently from how we've done it before in class at the recommendation of our sources. Since matrix factorization needs all of the interactions of playlists and songs in order to learn the proper latent factors, we can't just remove a chunk of the playlists for a test set. Instead, we masked a certain percentage of interactions in the train set and revealed them in the test set to compare the real interactions to the interactions we predicted. For our train/test split, we masked 10 songs in each playlist that was longer than 50 songs with at least 20 unique songs, meaning that our max split was 50%. Our validation set was the set of songs that were masked in the training set. Our test set was the entire dataset without any masked data.
To evaluate our models' performances, we used precision@K which evaluates the proportion of K recommended items that are relevant to the user (https://medium.com/@m_n_malaeb/recall-and-precision-at-k-for-recommender-systems-618483226c54). In our case, this means what proportion of K recommended songs are actually present in the playlist. For all models, we chose to set K=10, meaning we recommended 10 songs for each playlist.
In generating the sparse matrix necessary to perform matrix factorization, however, we ran into our first modeling problem. To get our data into a form that is compatible with this method, we needed to build a sparse matrix of [playlist x unique song] where the entries are +1 for each time the playlist contains the song and 0 if it doesn’t. Our references warned that a too sparse matrix may not work since there wouldn’t be enough playlist x song interactions to learn any latent variables, so we were aiming for a sparsity of 99.5% or less. Unfortunately, we found that the sparsity was 99.996%, so decided to filter out songs that did not appear frequently enough and playlists that weren’t long enough. After that filtering step, we managed to get our sparsity down to 99.7979%, and we decided that it was acceptable since it was within range of one of the examples (99.86% sparsity).
A very simple song recommender would recommend the top K most popular songs in the dataset, regardless of the playlist we want song recommendations for. Since we are dealing with implicit ratings (i.e. the presence or absence of a song in a playlist), song popularity can be measured by how many times it is featured. According to one of our tutorial sources (https://jessesw.com/Rec-System/ ) popularity is a little difficult to beat, so this served as our baseline model.
Our popularity based model, which simply recommended the top 10 most popular songs in the dataset for every playlist, had a precision@K of 6.5% on the train set and 0.75% on the validation set. This is clearly not very good, and we were hoping to do better with our collaborative filtering model which would learn information about latent variables within the sparse matrix in order to make suggestions.
For our better model, we implemented the Alternating Least Squares Recommender model in the implicit Python library. This is a model that does Weighted Regularized Matrix Factorization using Alternating Least Squares to find the proper factorization of a sparse matrix.
Parameters alpha, factors, and regularization needed to be fit, so we fit our model on our train set and decided our parameters based on validation performance (precision@K on validation set). We tried our best to perform a grid search, but this proved difficult as Google Colab kept running out of RAM. We ended up performing a very constrained grid search and found that alpha=50, factors=64, and regularization=10.0 gave us the best results, with a precision@K of 45.58% on the train set and 5.96% on the validation set. The performance improved a lot with our matrix factorization!
Next, we fit our model (with the parameters from our grid search on train data) on the test set (the whole matrix with no masked information), and we found a precision@K of 51.08%, meaning that over half of the recommended songs per playlist were actually in the playlist!