REVEAL 2020: Bandit and Reinforcement Learning from User Interactions
Video recording of the workshop
Important dates:
Submission deadline: July 29th
Author notification: August 21st
Camera-ready deadline: September 4th
Workshop: September 26th
State-of-the-art recommender systems are notoriously hard to design and improve upon, due to their interactive and dynamic nature, since they involve a multi-step decision-making process, where a stream of interactions occurs between the user and the system. Leveraging reward signals from these interactions and creating a scalable and performant recommendation inference model is a key challenge. Traditionally, to make the problem tractable, the interactions are often viewed as independent, but in order to improve recommender systems further, the models will need to take into account the delayed effects of each recommendation and start reasoning/planning for longer-term user satisfaction. To this end, our workshop invites contributions that enable recommender systems to adapt effectively to diverse forms of user feedback and to optimize the quality of each user's long-term experience.
Due to their interactive nature, recommender systems are also notoriously hard to evaluate. When evaluating their systems, practitioners often observe significant differences between a new algorithm’s offline and online results, and therefore tend to mostly rely on online methods, such as A/B testing. This is unfortunate, since online evaluation is not always possible and often expensive. Offline evaluation, on the other hand, provides a scalable way of comparing recommender systems and enables the participation of academic research in an industry-relevant problem.
In the past, recommender systems have been evaluated using proxy offline metrics coming from supervised methods, such as regression metrics (mean squared error, log likelihood), classification metrics (area under precision/recall curve) or ranking metrics (precision@k, normalized discounted cumulative gain). Recent research on recommender systems makes the link with counterfactual inference for offline A/B testing that reuses logged interaction data, and as well as the use of simulators that entirely avoid the use of potentially privacy-sensitive user data.
In this context, we believe it is timely to organize a workshop that re-visits the problem of designing and evaluating recommender systems and makes sure the community, spanning academic and industrial backgrounds, is working on the right problem: find for each user, the most impactful recommendation.
This workshop is the follow-up of REVEAL’18 and REVEAL’19, both of which had strong participation at RecSys 2018 in Vancouver and RecSys 2019 in Copenhagen. Our proposal is to keep pushing forward the boundary of research on the following topics:
Reinforcement learning and bandits for recommendation
Robust estimators and counterfactual evaluation
Using simulation for recommender systems evaluation
Open datasets and new offline metrics
The benefits of this workshop will be:
To bridge the gap between academia and industry through new datasets and methods
To increase the productivity of all practitioners in their development of recommender systems
Organizers:
Thorsten Joachims, Information Science and Computer Science, Cornell University
Adith Swaminathan, Deep Learning Technology Center, Microsoft Research
Maria Dimakopoulou and Yves Raimond, Netflix Research
Olivier Koch and Flavian Vasile, R&D, Criteo