Next Seminar

December 9th 2025, 6 pm UTC

Speaker: Nived Rajaraman (Microsoft)

Title: Beyond UCB: The curious case of non-linear ridge bandits

Paper: link

Slides: TBA

The recording will be uploaded here after the event.

Beyond UCB: The curious case of non-linear ridge bandits

Authors: Nived Rajaraman, Yanjun Han, Jiantao Jiao, and Kannan Ramchandran

Abstract: There is a large volume of work on bandits and reinforcement learning when the reward/value function satisfies some form of linearity. But what happens if the reward is non-linear? In this talk, I will present some of our work on the setting where the non-linear reward function takes the form of a ridge function / single index model. Two curious phenomena arise for ridge bandits: first, in addition to the "learning phase" with a standard regret, there is an "initialization phase" with a fixed sample cost determined by the nature of the reward function; second, achieving the smallest sample cost in the initialization phase requires new learning algorithms beyond traditional ones such as UCB. For a special family of non-linear bandits taking the form of a “ridge" function for a non-linear monotone function, we derive upper and lower bounds on the optimal fixed cost of learning, and in addition, on the entire “learning trajectory” via differential equations. Our work proposes a two-stage exploration algorithm which first finds a good initialization, and subsequently exploits local linearity in the learning phase. In several important special cases (convex $f$, and concave $f$), we prove that the optimal guarantees are also achieved by a very simple algorithm, an interactive variant of stochastic gradient descent. In contrast, several classical and celebrated algorithms, such as UCB and algorithms relying on online/offline regression oracles, are proven to be suboptimal.

Speaker Bio: Nived Rajaraman is currently a postdoctoral researcher at Microsoft Research NYC. He recently finished his PhD at Berkeley, advised by Jiantao Jiao and Kannan Ramchandran. His research interests lie in the intersection of sequential decision making and large language models. He was named a CPAL Rising Star for 2025.

How To Join:

Join the Google group here to get details of this, and future seminars.
Watch the recording of the seminar later on YouTube.

Page updated

Google Sites

Report abuse