Dear all, I'm Alessio Russo, a Postdoc at BU (PLAIA group)
I’m planning a PhD-level short course on sequential hypothesis testing with a focus on deriving lower and upper bounds for fixed-confidence problems in best-arm identification (Multi-Armed Bandit Problems) and best-policy identification (tabular Reinforcement Learning). If time permits, we will also cover Quickest Change Detection and Bayesian Sequential Hypothesis Testing.
Core techniques include:
Change-of-measure/KL lower bounds
Stopping/recommendation rules
Analysis of the sample complexity
What the course covers (concise overview):
We develop an information-theoretic view of pure exploration in bandits and tabular RL. Students will learn to (i) formalize fixed-confidence BAI/BPI, (ii) derive instance-dependent lower bounds via change of measure, (iii) design delta-correct stopping and recommendation rules, and (iv) match lower bounds with algorithms whose sample complexity is instance-optimal up to constants or logs. Optional modules introduce quickest change detection and Bayesian sequential testing to contrast frequentist vs Bayesian objectives.
By the end, students will be able to:
Specify fixed-confidence BAI/BPI problem statements and correctness criteria.
Derive KL-based change-of-measure lower bounds and interpret instance complexity terms.
Construct sequential tests (stopping + recommendation) and prove δ-correctness.
Analyze sample complexity (upper bounds) for canonical algorithms and compare to lower bounds.
Extend BAI tools to tabular RL BPI via confidence sets and information propagation across transitions.
The course will run in person at the faculty of Computing and Data Sciences at Boston University over the last 2 weeks of November: 3 sessions per week, 2 hours per session.
Tentative Schedule (4-6pm):
- Friday 14/11, CDS 164
- Monday 17/11, PHO 117A
- Tuesday 18/11, STH 319 (Confirmed)
- Wednesday 19/11, PHO 117A
- Monday 24/11, PHO 117A
- Tuesday 25/11, CAS 424 (Confirmed)
Prerequisites: knowledge of probability theory (e.g., having taken a class on multi-armed bandit problems).
If you’re interested, please fill this form. I will use the email you provide to send more information in the future.
Disclaimer: the content/calendar of the course may change in the future.
Garivier, Aurélien, and Emilie Kaufmann. "Optimal best arm identification with fixed confidence." Conference on Learning Theory. PMLR, 2016.
Kaufmann, Emilie, and Wouter M. Koolen. "Mixture martingales revisited with applications to sequential tests and confidence intervals." Journal of Machine Learning Research 22.246 (2021): 1-44.
Al Marjani, Aymen, Aurélien Garivier, and Alexandre Proutiere. "Navigating to the best policy in markov decision processes." Advances in Neural Information Processing Systems 34 (2021): 25852-25864.
Russo, Alessio, and Filippo Vannella. "Multi-reward best policy identification." Advances in Neural Information Processing Systems 37 (2024): 105583-105662.
Russo, Alessio, Ryan Welch, and Aldo Pacchiano. "Learning to Explore: An In-Context Learning Approach for Pure Exploration." arXiv preprint arXiv:2506.01876 (2025).