This website gives information about part I of the course Advanced Topics in Stochastic Operations Research: Multi-armed bandit theory and applications. LNMB info: https://www.lnmb.nl/pages/courses/phdcourses/ATS2324.html
When and where:
November 20 on location
November 27 online
December 4 online
December 11 online
December 18 online.
Material: notes, and the book [BA] Bandit Algorithms, Tor Lattimore and Csaba Szepesvári, Cambridge University Press
Assignment: To pass this part of the course, you need to write a research proposal about a topic related to multi-armed bandit type algorithms, and give feedback to other proposals. Deadlines are January 31, 2024 for the proposal, and February 21, 2024 for the feedback. See below for details.
Contents:
Lecture 1: introduction to MAB, explore-then-commit policies
Learning goals:
know the formal definition of a finite-armed stochastic bandit problem, including the definitions of policy and regret.
know to use concentration inequalities by Markov, Chebyshev, Hoeffding, and understand Chernoff's method and its use of moment-generating functions and sub-gaussianity.
know definition and underlying intuition of explore-then-commit, epsilon-greedy, adaptive-elimination, follow-the-leader.
know and understand regret analysis of explore-then-commit and related policies.
know the concept of anytime algorithms and understand the doubling trick.
Material: chapter 1, 4.1-4.5, 5, 6 from [BA], and notes.
Slides (password = first name of the LNMB boss, lowercase)
Lecture 2: upper-confidence bound policies, EXP3, adversarial bandits
Learning goals:
know formally and intuitively the UCB policy, and know how to prove regret upper bounds.
understand the differences between adversarial and stochastic bandit.
understand the proof and intuition of the EXP3 algorithm.
Material: chapter 7,8,9 and 11 from [BA]. Note: chapters 8,9 high-level only.
Lecture 3: Regret lower bounds
Learning goals:
know and understand definitions and (proofs of) key properties of KL divergence.
know and understand the Bretagnolle-Huber-Carol inequality (note: we provide a different proof than [BA]).
understand the proof and intuition of both worst-case and instance-dependent regret lower bounds.
Material: notes on KL divergence (and on Vogel 1960); chapter 14.2 (page 189), 15, and 16.1 from [BA].
Lecture 4: Bayesian bandits, Gittins index, Thompson sampling
Learning goals:
know the concepts of a Bayesian bandit, defined as a Markov decision problem.
know the idea and intuition of Gittins Index theorem
know the idea and intuition of Thomson Sampling including the proof of its Bayesian Regret upper bound.
Material: chapter 35.4 (high level ideas) and 36.1 from [BA]. We discuss proofs from R. Weber (1992). On the Gittins Index for multiarmed bandit problems. Ann. Appl. Probab. 2(4), 1024-1033, and from Slivkins, A., Introduction to Multi-armed bandits, 2022, section 3.3.
Lecture 5: dynamic pricing and learning
Learning goals:
know the formulation of the standard dynamic-pricing-and-learning problem.
understand the main intuition of (the regret upper bound of) controlled variance pricing
understand the intricasies of data-driven price policies that learn to collude
Material:
Assignment details:
To pass this part of the course, the assignment is to write a research proposal about a topic related to multi-armed bandit type algorithms, and give feedback to other proposals.
The proposal should describe:
Societal and scientific context (i.e. literature)
Research questions of the proposal
Proposed (mathematical) methods and techniques
Expected results
Scientific and societal relevance of your results
Remarks:
The proposal can be related to your own PhD research, but it should be a novel project, i.e. not something that you or somebody else is already working on.
Make a convincing case that your research proposal is novel i.e. has not been solved yet by existing papers.
Make a convincing case that your research is scientifically relevant
Try to be concrete with your proposed (mathematical) methods and techniques and expected results
The `size' of the research project should be, say, 2 papers.
There is no word limit, but try to have no more than 5 pages excl. bibliography.
This is an individual assignment, not a group assignment. You are allowed to informally give each other feedback, but the writing and thinking should be done on a 100% individual basis.
Put your name and email address on the first page.
Hand-in as pdf-file via email to boer@uva.nl.
Deadline for handing in the proposals: January 31, 2024.
You will then be asked to give constructive feedback for ≤3 other proposals. For each proposal assigned to you, write ≤1 page in which you evaluate the proposal w.r.t. the following criteria: novelty/originality, scientific/societal relevance, feasibility, quality of writing. Discuss strengths and weaknessnes, and give concrete suggestions for improvement.
Please be kind and constructive to each other; this is meant to be a learning experience.
Send your feedback as pdf via email to me, with the writer of the proposal in the CC.
Deadline for handing in the feedback: February 21, 2024.
Your grade for this part of the course will be based both on your proposal and the quality of your feedback.