Reward Optimizing Recommender Systems using Deep Learning and Fast Maximum Inner Product Search

Abstract

How can we build and optimize a recommender system that rapidly fills slates (e.g., banners) of personalized recommendations? The combination of deep learning stacks with fast maximum inner product search (MIPS) algorithms has shown that it is possible to deploy flexible models in production that can rapidly deliver personalized recommendations to users. Albeit promising, this methodology is unfortunately not sufficient to build a recommender system that maximizes the reward, e.g., the probability of click. Instead, we often optimize a proxy loss and A/B testing is used to see if the system actually improved performance. This tutorial takes participants through the necessary steps to model the reward and directly optimize the reward of recommendation engines built upon fast search algorithms to produce high-performance reward-optimizing recommender systems.

Tutorial Outline

The task of recommendation involves finding, often at high speed, a small number of relevant items for a user from a massive catalog. This tutorial covers state-of-the-art methods for designing recommender systems specifically building on the following technologies.

Deep learning for flexible definitions of the objective to be optimized.
Fast (approximate) maximum inner product search (MIPS) to allow very rapid large-scale recommendation.
Reward-optimizing recommendation methods that align the optimization problem and the metrics of interest at A/B test time. This is either done using state-of-the-art modeling approaches or the Horvitz-Thompson estimator. We are acutely aware that in real-world settings multiple recommendations slates are typically shown simultaneously.