WebConf 2021 Tutorial:
Recommender Systems through
the Lens of Decision Theory

Unifying Policy- and Value-based Approaches to Recommendation


Decision theory is a 100 year old science that explicitly separates states of nature, decision rules, utility functions and models to address the universal problem of decision making under uncertainty. In the context of recommender systems, this separation allows us to formalise different approaches to learning from bandit feedback. Policy approaches use an inverse propensity score estimator and directly optimise a decision rule that maps the user context to a recommendation. In contrast, value-based approaches use bandit feedback to learn a model of the reward, and considering the appropriate decision rule as a separate step. This tutorial uses the richer language of decision theory to present policy- and value-based methods in a common framework. With extensive examples we explore how these methods can be applied to recommendation problems, emphasising on situations with low probability of reward and very large action spaces. We offer side-by-side comparisons between these methods outlining their strengths and weaknesses, such as estimator variance, model mis-specification, tractability and ease-of-use. By identifying the modes of failure for every class, this can provide practical guidelines for future practitioners as to which method to apply in which types of environments. The use of bandit feedback to improve recommender system performance has become a linchpin of modern recommendation. This tutorial unifies the major classes of methods, providing a thorough overview of an actual and important topic.

Tutorial Outline

  1. Core concepts in Decision Theory

    1. Optimising the model (value-based)

    2. Optimising the decision rule (policy-based)

  1. Recommendation for Real Systems

  1. Case studies in Decision Theory for Recommendation

    1. Hedging Risk and Sample Variance Penalisation
      (Google Colaboratory Notebook)

    2. Hedging in Slate Recommendation
      Google Colaboratory Notebook)

    3. Bayes VS Inverse Propensity Scoring
      Google Colaboratory Notebook)

  1. Convergence Properties of Value- and Policy-based Models

    1. Policy & Value: A Love Story
      Google Colaboratory Notebook)

Schedule and Programme

April 13th, 2PM - 5PM CEST, more information on the program can be found here.

Tutorial materials

All tutorial materials (slides and hands-on notebooks) can be found in this GitHub repository.
Direct links to the notebooks in Google Colaboratory are linked in the agenda above.


Flavian Vasile
Principal Scientist
Criteo AI Lab

David Rohde
Research Scientist
Criteo AI Lab

Olivier Jeunen
PhD Student
University of Antwerp

Amine Benhalloum
Senior Machine Learning Engineer
Criteo AI Lab

Otmane Sakhi
PhD Student
Criteo AI Lab