Seminars on Pure Exploration and Active Sequential Hypothesis Testing

Course Description

Dear all, I'm Alessio Russo, a Postdoc at Boston University (PLAIA group).

I’m planning a PhD-level short course on pure exploration and sequential hypothesis testing with a focus on deriving lower and upper bounds for fixed-confidence problems in best-arm identification (Multi-Armed Bandit Problems) and best-policy identification (tabular Reinforcement Learning). If time permits, we will also cover Quickest Change Detection and Bayesian Sequential Hypothesis Testing.

Core techniques include:

Change-of-measure/KL lower bounds
Stopping/recommendation rules
Analysis of the sample complexity

What the course covers (concise overview):
We develop an information-theoretic view of pure exploration in bandits and tabular RL. Students will learn to (i) formalize fixed-confidence BAI/BPI, (ii) derive instance-dependent lower bounds via change of measure, (iii) design delta-correct stopping and recommendation rules, and (iv) match lower bounds with algorithms whose sample complexity is instance-optimal up to constants or logs. Optional modules introduce quickest change detection and Bayesian sequential testing to contrast frequentist vs Bayesian objectives.

Learning objectives

By the end, students will be able to:

Specify fixed-confidence BAI/BPI problem statements and correctness criteria.
Derive KL-based change-of-measure lower bounds and interpret instance complexity terms.
Construct sequential tests (stopping + recommendation) and prove δ-correctness.
Analyze sample complexity (upper bounds) for canonical algorithms and compare to lower bounds.
Extend BAI tools to tabular RL BPI

Schedule

The course will run in person at the faculty of Computing and Data Sciences at Boston University over the last 2 weeks of November: 3 sessions per week, 2 hours per session.

Tentative Schedule (4-6pm EST) :

- Friday 14/11, CDS 164

- Monday 17/11, PHO 117A (Confirmed)

- Tuesday 18/11, STH 319 (Confirmed)

- Wednesday 19/11, PHO 117A (Confirmed)

- Monday 24/11, PHO 117A (Confirmed)

- Tuesday 25/11, CAS 424 (Confirmed)

Classes will be recorded, and videos will be posted online. A zoom link will also be available for attending remotely.

Prerequisites and Registration

Prerequisites: knowledge of probability theory (e.g., having taken a class on multi-armed bandit problems).

If you’re interested, please send me an email (arusso2@bu.edu).

Disclaimer: the content/calendar of the course may change in the future.

Calendar and Content

ASHT 2025

Reference Material

Garivier, Aurélien, and Emilie Kaufmann. "Optimal best arm identification with fixed confidence." Conference on Learning Theory. PMLR, 2016.
Kaufmann, Emilie, and Wouter M. Koolen. "Mixture martingales revisited with applications to sequential tests and confidence intervals." Journal of Machine Learning Research 22.246 (2021): 1-44.
Al Marjani, Aymen, Aurélien Garivier, and Alexandre Proutiere. "Navigating to the best policy in markov decision processes." Advances in Neural Information Processing Systems 34 (2021): 25852-25864.
Russo, Alessio, and Filippo Vannella. "Multi-reward best policy identification." Advances in Neural Information Processing Systems 37 (2024): 105583-105662.
Russo, Alessio, Ryan Welch, and Aldo Pacchiano. "Learning to Explore: An In-Context Learning Approach for Pure Exploration." arXiv preprint arXiv:2506.01876 (2025).

Page updated

Google Sites

Report abuse