To apply machine learning and deep learning methods to provide movie recommendations with the MovieLen100M dataset. This project will compare and interpret the accuracy and performance of singular value decomposition (SVD), Neural network-based collaborative filtering (NCF) , and Facebook AI Similarity Search (Faiss - if time allows).
With the rise of Youtube, Amazon, Netflix, and many other similar web services, recommendation systems have taken more and more place in our lives. From E-commerce to advertisement to entertainment, you interact with them in every aspect of your lives. For example, when you are shopping from Amazon, the Amazon recommendation system will suggest you buy the related products that you are currently browsing. Netflix will provide movie recommendations based on your viewing history to suggest the movie that you may like. The Recommendation systems are critical in these industries as it will generate a huge amount of income when it recommends the right product accordingly.
The Dataset will be downloaded from GroupLens Research. It is a dataset that contains 25000095 5-stars ratings and 1093360 tag applications across 62423 movies. These data were created by 162541 users between January 09, 1995 and November 21, 2019. Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided. The data are contained in the files genome-scores.csv, genome-tags.csv, links.csv, movies.csv, ratings.csv, and tags.csv.
In phrase 2, I am going to introduce the algorithm of SVD and SVD++, and its limitation. Then, we will talk about the architecture of the NCF.
SVD and SVD++
SVD and SVD++ are ways of factorizing a matrix into three other matrices (A = UΣVᵀ). It runs principal component analysis on both the users and the items, and returns the matrices we need. It is known to produce a very accurate result, and it was used in the Netflix prize competition. It has some interesting algebraic properties and conveys important geometrical and theoretical insights about linear transformations. In the video, we introduced a biased SVD algorithm and the unbiased SVD algorithm. The SVD++ is the extension of SVD, and it takes into account the implicit ratings. An implicit rating describes the fact that a user rated an item j regardless of the rating value. It abstract each user with a factors vector and has a lower RMSE than SVD.
Limitation of SVD and SVD++
Essentially, SVD and SVD++ are projecting each user and item onto a latent space represented by a latent vector. The more similar the two latent vectors are, the more related the corresponding users' preference. We can measure the similarity of any two latent vectors with cosine-similarity or dot product. However, the dot product limits the expressiveness of user and item latent vectors.
NCF
With the limitation of the SVD and SVD++, NCF overcomes the limitation by using its neural architecture for learning the interaction function from data.
Input Layer binarises a sparse vector for a user and item identification where:
Item (I): 1 means the user u has interacted with Item(i)
User (u): To identify the user
The embedding layer is a fully connected layer that projects the sparse representation to a dense vector. The obtained user/item embeddings are the latent user/item vectors.
Neural CF layers use Multi-layered neural architecture to map the latent vectors to prediction scores. The embedded user and item vectors are concatenated before passing through a series of fully connected layers, which maps the concatenated embeddings into a prediction vector as output.
The final output layer returns the probable class using Logistic Sigmoid Function.
As mentioned above, we are exploring the performance and accuracy of SVD, SVD++, and NCF in collaborative filtering. Both SVD and SVD++ models will be evaluated using Root Mean Squared Error (RMSE). We are using a library called Surprise to train and test the model. We used the movie lens data with nearly 5.5 million rows of data on training and testing the SVD, SVD++. We used 80 % of the data to train both models and 20% to test them, cross-validation was performed to prevent overfitting. The results showed that SVD has an average of 0.8126 for RMSE, and SVD++ has an average of 0.8620. In other words, the SVD performs better than SVD++ in terms of RMSE (Lower RMSE). In terms of Performance, SVD took an average of 287.42 seconds to train and 16.35 seconds to test the model, while SVD++ takes an average of 424.99 seconds to train and 8.04 to test.
For NCF, we used the pytorch_lightning library as our deep learning framework. Since we require negative samples to indicate movies that the user has not interacted with, we have generated 4 negative samples for each row of data. The ratio of negative to positive samples is 4:1. Then, we fed the user input vector and item input vector to the user embedding and item embedding respectively, which results in smaller, denser user and item vectors. The embedded user and item vectors are concatenated before passing through a series of fully connected layers, which maps the concatenated embeddings into a prediction vector as output. Finally, we apply a Logistic Sigmoid function to obtain the most probable class. In terms of performance, NCF took the longest to train and fit. (30 mins to train, and 11 mins to test) However, it has the best accuracy out of all 3 models (0.90 in accuracy).
Although there is some trade-off in performance and accuracy, it is recommended to use NCF to perform collaborative filtering if the resources are available. We expect SVD++ performs better as it takes into account the data implicit feedback. However, our result shows that the SVD actually has a better RMSE than SVD++ in addition to a shorter train time.
There are several limitations in this project:
Unfortunately, the Movie Lens data has lacked implicit data such as the user interaction record of the movie items. As the NCF model requires implicit data, we have generated the synthetic data for the negative interaction data in order to have a comprehensive dataset.
Model Training and testing performance are taking a significant amount of time.
Due to the time limitation, we didn't have the time to explore the FAISS model.
Further work & Improvements:
We would like to get e-commerce data that has users, products, item ratings, and interaction records, and fit the new dataset to the model. It is important to see if this model can be implemented across different industries where collaborative filtering is needed for recommendations.
It is predicted that the performance will be improved if we have more resources available, such as more CPUs, RAMs, and GPUs.
Investigate and Explore the performance and accuracy of FAISS, and compare the result to the rest of the models.
Banerjee, S. (n.d.). Collaborative Filtering for Movie Recommendations. Keras. https://keras.io/examples/structured_data/collaborative_filtering_movielens/.
Berwa, R. A. (n.d.). Surprise: Movie Recommender System Example. Nextjournal. https://nextjournal.com/berwa/surprise-movie-recommender-system-example.
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1– 19:19. https://doi.org/10.1145/2827872
Huang, K. (n.d.). Paper Review: Neural Collaborative Filtering Explanation & Implementation. Retrieved April 12, 2021, from https://towardsdatascience.com/paper-review-neural-collaborative-filtering-explanation-implementation-ea3e031b7f96#:~:text=Neural%20Collaborative%20Filtering%20(NCF)%20is,to%20build%20a%20recommender%20system.
Hug, N. (n.d.). Matrix Factorization-based algorithms¶. Matrix Factorization-based algorithms - Surprise 1 documentation. https://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.SVD.
Koren, Y. (n.d.). Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. Retrieved from https://people.engr.tamu.edu/huangrh/Spring16/papers_course/matrix_factorization.pdf
Loy, J. (2020, October 18). Deep Learning based Recommender Systems. Kaggle. https://www.kaggle.com/jamesloy/deep-learning-based-recommender-systems/notebook?scriptVersionId=44973117.
Sharma, A. (n.d.). Neural Collaborative Filtering. Retrieved April 12, 2021, from https://towardsdatascience.com/neural-collaborative-filtering-96cef1009401
Spark, C. (2019, November 28). Tutorial: Practical Introduction to Recommender Systems. Medium. https://blog.cambridgespark.com/tutorial-practical-introduction-to-recommender-systems-dbe22848392b.
TwinPenguinsTwinPenguins. (1967, May 1). Interpreting the Root Mean Squared Error (RMSE)! Data Science Stack Exchange. https://datascience.stackexchange.com/questions/36945/interpreting-the-root-mean-squared-error-rmse.
Wood, T. (2020, September 27). Sigmoid Function. DeepAI. https://deepai.org/machine-learning-glossary-and-terms/sigmoid-function.