Programme

REVEAL '22
Thursday September 22nd 2022

Program

9:00-9:10: Welcome & Introduction

9:10-9:40: Invited talk #1 – User Intent Modeling in Recommender Systems // Yuyan Wang and Bo Chang (Google Brain)

9:40-10:00: Contributed talk #1 – Causal Adaptive Learning for Recommendations // Maria Dimakopoulou

10:00-10:20: Contributed talk #2 – Modelling User Preferences using a Partially Observed Markov Decision Problem for a Reinforcement Learning Sequence-Aware Recommender // Aayush S Roy

10:20-10:40: Contributed talk #3 – Sales Channel Optimization via Simulations based on Observational Data with Delayed Rewards: A Case Study at LinkedIn // Diana Negoescu

10:40-11:30: Coffee break & posters

11:30-12:00: Invited talk #2 – Enabling Reinforcement Learning for RecSys with unit-test problems // Ehtsham Elahi (Netflix)

12:00-12:20: Contributed talk #4 – A Contextual Bandit Problem with a Bounded (O(1)) Regret Policy // HYUNWOOK KANG

12:20-12:40: Contributed talk #5 – Control Variate Diagnostics for Detecting Problems in Logged Bandit Feedback // Ben London

12:40-13:40: Lunch break

13:40-14:10: Invited talk #3 – Optimizing Audio Recommendations for the Long-Term // Daniel Russo (Columbia University)

14:10-14:30: Contributed talk #6 – OFRL: Designing an Offline Reinforcement Learning and Policy Evaluation Platform from Practical Perspectives // Haruka Kiyohara

14:30-14:50: Contributed talk #7 – Extending Open Bandit Pipeline to Simulate Industry Challenges // Bram van den Akker

14:50-15:10: Contributed talk #8 – SkipAwareRec: A Sequential and Interactive Music Recommendation System

15:10-16:20: Coffee break & Posters

16:20-17:10: Panel discussion and Closing

Abstract

Invited talk #1 – User Intent Modeling in Recommender Systems // Yuyan Wang and Bo Chang (Google Brain)

Abstract: Existing recommendation solutions heavily rely on user-item level interactions to decide on what to recommend. While these approaches are effective for users who are interested in “continuing their last watches”, we argue that they fail to optimize long term user experience on the platform. The underlying intents and journeys of our users largely drive their behaviors on recommendation platforms. As a result, focusing on item-level interactions without a more abstract and higher-level understanding of our users limits our planning horizon. In this talk, we share our research findings on modeling user intents, both explicitly and implicitly, on large-scale recommender systems, and demonstrate how the extracted user intents can help the system improve user engagement and plan at longer time horizons.

Invited talk #2 – Enabling Reinforcement Learning for RecSys with unit-test problems // Ehtsham Elahi (Netflix)

Abstract: We show how reinforcement learning can be used to construct an optimal list of recommendations when the user has a finite time budget to make a decision from the list of recommendations. Working within the time budget introduces an extra resource constraint for the recommender system. It is similar to many other decision problems (for e.g. in economics and operations research) where the entity making the decision has to find tradeoffs in the face of finite resources and multiple (possibly conflicting) objectives. Although time is the most important and finite resource, we think that it is an often ignored aspect of recommendation problems. We show a Markov Decision Process based formulation of this problem and show how on-policy learning can be used to solve this problem. At the end, we compare it with the solution obtained by off-policy learning.

Invited talk #3 – Optimizing Audio Recommendations for the Long-Term // Daniel Russo (Columbia University)

Abstract: We study the problem of optimizing recommender systems for outcomes that realize over several weeks or months. Successfully addressing this problem requires overcoming difficult statistical and organizational challenges. We begin by drawing on reinforcement learning to formulate a comprehensive model of users' recurring relationship with a recommender system. We then identify a few key assumptions that lead to simple, testable recommender system prototypes that explicitly optimize for the long-term. We apply our approach to a podcast recommender system at a large online audio streaming service, and we demonstrate that purposefully optimizing for long-term outcomes can lead to substantial performance gains over approaches optimizing for short-term proxies.

CONSEQUENCES '22
Friday September 23rd 2022

Program

9:00-9:05: Welcome by the organisers
9:05-10:35: Tutorial: “Approaches to Off-Policy Evaluation in Large Action Spaces'' (Presented by Yuta Saito)

10:35-11:05: Coffee Break with Poster Session (30 minutes)

11:05-12:05: Invited Talk #1: Lihong Li, "Decision making in recommendation: An RL perspective" (45 minutes + 15 minutes Q&A)
12:05-12:45: Paper Session A (15 minutes + 5 minutes Q&A for each)
- 12:05-12:25: Contributed talk #1 (in-person) – “Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model” // Alexander Buchholz (Amazon); Ben London (Amazon); Giuseppe Di Benedetto (Amazon); Thorsten Joachims (Cornell)
- 12:25-12:45: Contributed talk #2 (virtual) – “VAE-IPS: A Deep Generative Recommendation Method for Unbiased Learning From Implicit Feedback” // Shashank Gupta (University of Amsterdam)*; Harrie Oosterhuis (Radboud University); Maarten de Rijke (University of Amsterdam)

12:45-14:15: Lunch Break (1 hour and 30 minutes)

14:15-15:15: Invited Talk #2: Guido Imbens, "Multiple Randomization Designs" (45 minutes + 15 minutes Q&A)
15:15-16:15: Paper Session B (15 minutes + 5 minutes Q&A for each)
- 15:15-15:35: Contributed talk #3 (in-person) – “Improving Accuracy of Off-Policy Evaluation via Policy Adaptive Estimator Selection” // Takuma Udagawa (Sony Group Corporation); Haruka Kiyohara (Tokyo Institute of Technology); Yusuke Narita (Yale University); Kei Tateno (Sony Group Corporation)
- 15:35-15:55: Contributed talk #4 (in-person) – “Adaptive Experimental Design and Counterfactual Inference” // Tanner Fiez (Amazon); Lalit Jain (University of Washington); Houssam Nassif (amazon); Sergio Gamez (Amazon); Arick Chen (Amazon)
- 15:55-16:15: Contributed talk #5 (in-person) – “Are Neural Click Models Pointwise IPS Rankers?” // Philipp K Hager (University of Amsterdam); Maarten de Rijke (University of Amsterdam); Onno Zoeter (Booking)

16:15-16:45: Coffee Break with Poster Session (30 minutes)

16:45-17:25: Paper Session C (15 minutes + 5 minutes Q&A for each)
- 16:45-17:05: Contributed talk #6 (virtual) – “CLEAR: Causal Explanations from Attention in Neural Recommenders” // Shami Nisimov (Intel Labs); Raanan Y. Rohekar (Intel Labs); Yaniv Gurwicz (Intel Labs); Guy Koren (Intel Labs); Gal Novik (Intel Labs)
- 17:05-17:25: Contributed talk #7 (in-person) – “Causal Evaluation of Item Fairness in Impression Delivery” // Winston Chou (Netflix); Nathan Kallus (Cornell University)
17:25-17:30: Closing by the organisers

Abstract

Keynote #1 – Decision making in recommendation: An RL perspective // Lihong Li

Recommender systems can be modeled as a decision maker that maximizes a utility function by making good recommendations. They can then be optimized with a rich set of algorithmic tools from reinforcement learning (RL). In this talk, I will overview four key steps in this approach, using examples from recent research: (1) which RL setting is appropriate for a particular application scenario; (2) how to collect data for effective exploration/exploitation tradeoff; (3) how to optimize the recommendation policy; and (4) how to evaluate the recommendation policy. The focus will be on the last problem, where the goal is to estimate the long-term utility of a recommender system without having to run long experiments. Our methods are inspired by recent advances in infinite-horizon off-policy RL, and adapted to deal with nonstationary user responses common in recommender systems.

Keynote #2 – Multiple Randomization Designs // Guido Imbens

In a classical randomized controlled trial (RCT), or A/B test, the starting point is a population of units, individuals, plots of land, shopping trips, or visits to a website. A randomly selected subset of the population is assigned to a treatment (treatment A), and the remainder of the population is assigned to the control treatment (treatment B). The difference in average outcome by treatment group is the standard estimator for the average effect of the treatment. A key assumption underlying the typical analysis of such experimental designs is the absence of interactions between units, or the stable unit treatment value assumption. However, the setting for modern online experimentation is often different, with complex interactions between units an intrinsic feature. Such interactions can invalidate the simple comparison of means as an estimator for the average effect of the treatment in classical RCTs, and more generally make classical experimental designs ineffective for estimating the causal effects of interest. I will discuss novel experimental designs for settings in which interactions are present. A key feature common to many of these designs is the presence of multiple layers of randomization within the same experiment and we discuss a particular experimental design, Multiple Randomization Designs or MRDs, that provides a general framework for such experiments. Through these complex designs, we can study questions about causal effects in the presence of interference that cannot be answered by classical RCTs.