Invited speakers

Minmin Chen (Google Brain)

Off-Policy Correction for a REINFORCE Recommender System


While reinforcement learning (RL) has achieved impressive advances in games and robotics, it has not been widely adopted in recommender systems. Framing recommendation as an RL problem offers new perspectives, but also faces significant challenges in practice. Industrial recommender systems deal with extremely large action spaces – many millions of items to recommend and complex user state spaces -- billions of users, who are unique at any point in time. In this talk, I will discuss our work on scaling up a policy-gradient-based algorithm, i.e. REINFORCE to a production recommender system at Youtube. We proposed algorithms to address data biases when deriving policy updates from logged implicit feedback. I will also discuss some follow up work and outstanding research questions in applying RL, in particular off-policy optimization in recommender systems.


Minmin Chen is a Research Scientist in Google Brain. Her main research interests are in machine learning, currently focusing on sequence modeling and reinforcement learning for recommendation systems. Before that, she was a research scientist at Criteo Lab, building computational models for online advertising, and Amazon, working on the Amazon Go project. She did her PhD study at Washington University in St. Louis on representation learning and domain adaptation. She published over 20 papers at top conferences in machine learning such as NIPS, ICML, ICLR and AISTATs.

Tao Ye, Mohit Singh (Pandora)

Offline and Online Performance of Contextual Bandit Algorithm in Personalized Ranking


Since 2005, Pandora has built a powerful music recommendation platform that powers the well know machine learning driven personalized radio product, and a fully personalized Browse feature to expand user’s music discovery. In our experience of building several key recommendation product features, we have encountered many different scenarios that utilize offline evaluation prior to deployment, ranging from dogfooding a new feature, to refining core recommendation algorithms, to experimenting with visual placements. We will briefly introduce diverse set of strategies and metrics used in these evaluation, then dive deep into a case study where offline and online performance relationship is observed.

Browse is aimed at providing users with a discovery experience. Different types of items, such as albums, curated music stations, playlists, and artists, are grouped into modules and presented to users on one scrollable page. Personalized ranking of these modules are designed by combining relevancy scores of user-items and a contextual multi-arm bandit algorithm to learn click preferences online. In the offline evaluation, we used mainly the ranking metrics (MRR@K, Recall@K) to evaluate the model on an intentionally collected randomized dataset. The evaluation protocol we chose is replay, which updates model estimation after each data instance in sequential order. In the online A/B testing, our key metrics are instead business driven. We discuss the implication of the evaluation protocol, how offline metrics determined the implementation order of algorithms, the observed correlation between online and offline, and interesting surprises during A/B and post deployment ramp up. We hope this talk will provide other practitioners with a case-study of an online algorithm’s performance in music discovery, and stimulate discussion on algorithm evaluations in general.


Dr. Tao Ye is a Principal Scientist and Sr. Manager of Science at Pandora. She is a founding member of the Pandora science team, and has been working on personalized recommendation systems, measurements, and user modeling since 2010. Most recently, she has been leading the personalization and discovery science team that advances the machine learning and data driven innovations in search, voice interface, and many personalized product features in Pandora. She has two decades of experience in the software industry, holding research scientist and lead engineer positions in social media, networking and mobile systems. She holds 14 granted patents and has published 12 peer reviewed papers.

She received her PhD from University of Melbourne in Electrical and Electronic Engineering, her MS from UC Berkeley in EECS and dual BS degrees from Stony Brook University in CS and Engineering Chemistry.


Mohit Singh is a Scientist in Personalization and Discovery team at Pandora. At Pandora, he has worked on various projects ranging from personalizing music for radio, genre classification, artist recommendation and solving "ranking" problems in browse. Prior to Pandora, Mohit worked at Rdio as a data scientist where he lead music personalization based on listener context and features extracted from audio. He has over 5 years of industry experience specifically in large scale recommendation systems working in companies like Intel Research, American Express and One Kings Lane. He received his first Masters in Robotics from CMU and finished his second Masters in CS from Georgia Tech.

Yves Raimond (Netflix)

Correlation vs Causation in Recommender Systems


What is a truly impactful recommendation? Most recommender systems are built around correlation: trying to model the probability of a particular action being taken by a user, and recommending an item when that probability is maximized. These algorithms are very powerful, but they miss out on a key aspect: the act of recommending something will have a non-trivial impact on the outcome. This can cause all sorts of issues, from uncontrolled feedback loops in the system to offline/online evaluation mismatches. In this talk we are going to provide some insights as to how to solve these problems by building causal recommender systems; modelling the actual impact of a given recommendation.


Dr. Yves Raimond is a Research/Engineering Director at Netflix, where he leads the Promotion & Growth Algorithm Engineering team: a mixed team of researchers and engineers building the next generation of Machine Learning algorithms used to drive the Netflix experience. Before that, he was a Lead Research Engineer in BBC R&D, working on information extraction from Multimedia content. He holds a PhD from Queen Mary, University of London.