WebConf 2021 Tutorial:
Recommender Systems through
the Lens of Decision Theory
Unifying Policy- and Value-based Approaches to Recommendation
Abstract
Decision theory is a 100 year old science that explicitly separates states of nature, decision rules, utility functions and models to address the universal problem of decision making under uncertainty. In the context of recommender systems, this separation allows us to formalise different approaches to learning from bandit feedback. Policy approaches use an inverse propensity score estimator and directly optimise a decision rule that maps the user context to a recommendation. In contrast, value-based approaches use bandit feedback to learn a model of the reward, and considering the appropriate decision rule as a separate step. This tutorial uses the richer language of decision theory to present policy- and value-based methods in a common framework. With extensive examples we explore how these methods can be applied to recommendation problems, emphasising on situations with low probability of reward and very large action spaces. We offer side-by-side comparisons between these methods outlining their strengths and weaknesses, such as estimator variance, model mis-specification, tractability and ease-of-use. By identifying the modes of failure for every class, this can provide practical guidelines for future practitioners as to which method to apply in which types of environments. The use of bandit feedback to improve recommender system performance has become a linchpin of modern recommendation. This tutorial unifies the major classes of methods, providing a thorough overview of an actual and important topic.
Tutorial Outline
Core concepts in Decision Theory
Optimising the model (value-based)
Optimising the decision rule (policy-based)
Case studies in Decision Theory for Recommendation
Hedging Risk and Sample Variance Penalisation
(Google Colaboratory Notebook)Hedging in Slate Recommendation
(Google Colaboratory Notebook)Bayes VS Inverse Propensity Scoring
(Google Colaboratory Notebook)
Convergence Properties of Value- and Policy-based Models
Policy & Value: A Love Story
(Google Colaboratory Notebook)
Schedule and Programme
April 13th, 2PM - 5PM CEST, more information on the program can be found here.
Tutorial materials
All tutorial materials (slides and hands-on notebooks) can be found in this GitHub repository.
Direct links to the notebooks in Google Colaboratory are linked in the agenda above.
Speakers
Flavian Vasile
Principal Scientist
Criteo AI Lab
David Rohde
Research Scientist
Criteo AI Lab
Olivier Jeunen
PhD Student
University of Antwerp
Amine Benhalloum
Senior Machine Learning Engineer
Criteo AI Lab
Otmane Sakhi
PhD Student
Criteo AI Lab
ENSAE