Do You Prefer Learning with Preferences

Neurips'23 - Dec 11, New Orleans, USA

Aadirupa Saha & Aditya Gopalan

TUTORIAL OVERVIEW

This tutorial intends to cover the development of and recent progress on machine learning with preference-based feedback. where the goal is to sequentially learn the best-action of a decision set from preference feedback over an actively chosen subset of items. We will first cover the basic fundamentals of classical (reward) based multiarmed bandits problem and the limitations of this framework when reward are unknown/ harder to obtain. Drawing motivations from these limitations, we will then start with a brief overview of the motivation and problem formulation and understand the breakthrough results for the simplest pairwise preference setting (where the subsets are of size 2), famously studied as the `Dueling Bandit' problem in the literature. We will further generalize this to the `Battling Bandits' (general subsetwise preference based bandits) framework for subsets of any arbitrary size and understand the tradeoff between learning rates-vs-increasing subset sizes. 

Schedule

Tutorial Content

Target Audience (Prerequisites)

The tutorial is meant to be accessible to the entire machine learning community, and especially useful for bandits and reinforcement learning researchers.

Prerequisites: A basic knowledge of probability theory, and linear algebra should be enough. Familiarity with standard concentration inequalities, state of the art multiarmed bandits (MAB) algorithms would be helpful (only to understand the algorithm technicalities), but not necessary and as mentioned, we will cover the basics of classical MAB techniques in the beginning of the talk. The tutorial will be self-contained with all the basic definitions.

Most of the target audiences are likely to be Machine Learning oriented, cutting across grad students, postdocs, or faculties. Overall, any first year grad student is expected to be comfortable. The tutorial intends to provide enough exposure to the audience to built a basic understanding of bandit-problems, the need of its preference counterpart, existing results, and exciting scopes of open challenges.

Some References

You are also welcome to check some of our recent publications on online/bandit learning from preference feedback in complex environments.