Speaker: Dorian Baudry (INRIA)
Title: Does Stochastic Gradient really succeed for bandits?
Paper: link
More Details:
Authors: Dorian Baudry, Emmeran Johnson, Simon Vary, Ciara Pike-Burke, Patrick Rebeschini
Abstract: Recent works have deepened the theoretical understanding of the Stochastic Gradient Bandit (SGB) policy, showing that using a constant learning rate guarantees asymptotic convergence to the optimal policy, and that sufficiently small learning rates can yield logarithmic regret. However, whether logarithmic regret holds beyond small learning rates remains unclear. In this work, we take a step towards characterizing the regret regimes of SGB as a function of its learning rate. For two–armed bandits, we identify a sharp threshold, below which SGB achieves logarithmic regret on all instances, and above which it can incur polynomial regret on some instances. This result highlights the necessity of knowing (or estimating) the minimum gap to ensure logarithmic regret with a constant learning rate. For general K-armed bandits, we further show the learning rate must additionally scale inversely with K to avoid polynomial regret. We introduce novel techniques to derive regret upper bounds for SGB, laying the groundwork for future advances in the theory of gradient-based bandit algorithms.
Speaker Bio: I am a researcher at Inria Grenoble & Université Grenoble Alpes, working on the theory of sequential decision-making problems, in particular multi-armed bandits. This work was conducted while I was a postdoctoral researcher at the statistics department of Oxford University, where I was working with Patrick Rebeschini. Previously, I had a first experience as a postdoctoral researcher at ENSAE & IP Paris, where I was working with Vianney Perchet; and I received my PhD from Université de Lille in 2022, where I was supervised by Emilie Kaufmann and Odalric-Ambrym Maillard.