Amatriain, X. (2013)
As a video-streaming recommender system, Netflix’s goal is to optimize the probability that a user selects an item and enjoys it in such a way that fosters customer loyalty. The company has done so by (1) providing personalized experiences to its members through data mining of user ratings, interactions, and content metadata; and by (2) continuously adapting its Super Crunching models, metrics, and system architectures to manage market changes and scalability. Amatriain (2013) discusses the different components Netflix uses to extract the information needed for creating a personalized service.
First, what forms the basis of Netflix’s item recommendations?
Netflix’s personalization starts in the homepage, where the recommended content is organized in horizontal rows and into different content categories. The type and order in which each item appears in a row is based on Netflix’s algorithms best guess of the titles a user is most likely to enjoy. In fact, personalization at Netflix is intended to cater not just to a single user but to a household, and thus generate recommendations that appeal to different and evolving tastes and moods. Each row represents three layers of personalization: the choice of genre, the recommended subset of titles within that genre, and the ranking of those titles. This information is predicted based on the implicit content preferences of the user given her recent plays, ratings, and other interactions, as well as explicit feedback through surveys. Thus, their system aims to optimize for accuracy, diversity, and freshness of content, and to generate awareness about their continuous adaptation of their users’ preferences. Another important source of personalization is similarity, which can be based between movies and between members. This is where collaborative filtering data comes into play.
Second, how are item recommendations ranked?
To present an attractive set of preferences in each row, Netflix implements a ranking model to optimize the selection and sorting of items for a user in real-time. One baseline factor that makes up the ranking function is item popularity because on average a user is most likely to watch what the majority of other users are watching. This is combined with Netflix’s predicted rating and additional factors used to feed the machine learning algorithms. To learn more about the science behind the algorithms click here.
Figure 5. Performance of Netflix ranking system when adding features.
Third, what type of data is collected for the predictive models?
Netflix highlights the importance of both the availability of large quantity and quality of data and the use of appropriate data models to optimize recommendations. The data sources used include the following:
Finally, what data modeling approaches and software architectures does Netflix use?
Having a large volume of data is of no use if the right data mining models are not implemented for the training, testing and predictions of algorithms. Netflix uses a variety of machine learning approaches ranging from unsupervised methods, like clustering, affinity propagation and feature extraction, to supervised methods, including linear and logistic regressions, matrix factorization, and restricted Boltzmann Machines. In terms of their software architecture, Netflix focuses in employing design that scales and adapts efficiently to large streams of data. They combine online, offline, and model training computations, which for example use and manage stored historical data and online real-time events to maintain a seamless, responsive recommendation system.