Talk Date and Time: October 29, 2024 at 5:00 pm - 5:45 pm EST followed by 15 minutes of Q&A on Google Meet
Topic: Restless Bandits with Global Rewards
Abstract:
Restless multi-armed bandits (RMAB) extend multi-armed bandits so pulling an arm impacts future states. Despite the success of RMABs, a key limiting assumption is the separability of rewards into a sum across arms. We address this deficiency by proposing restless-multi-armed bandit with global rewards (RMAB-G), a generalization of RMABs to global non-separable rewards. To solve RMAB-G, we develop the Linear- and Shapley-Whittle indices, which extend Whittle indices from RMABs to RMAB-Gs. We prove approximation bounds but also point out how these indices could fail when reward functions are highly non-linear. To overcome this, we propose two sets of adaptive policies: the first computes indices iteratively, and the second combines indices with Monte-Carlo Tree Search (MCTS). Empirically, we demonstrate that our proposed policies outperform baselines and index-based policies with synthetic data and real-world data from food rescue.
Bio:
Naveen Raman is a second year PhD student in the Machine Learning Department at Carnegie Mellon University, advised by Fei Fang, and funded by an NSF Fellowship. His research focuses on improving decision-making in uncertain environments, including topics such as multi-agent learning, bandit feedback, and sequential resource allocation. His work studies these problems in real-world applications including food sustainability and healthcare. He previously obtained his MPhil in Advanced Computer Science on a Churchill Scholarship, where he worked with Mateja Jamnik to understand explainability through concept learning models. He received his BS in Computer Science and Math from the University of Maryland, during which he worked with John Dickerson on fairness in rideshare, and with Jordan Boyd-Graber on entity linking algorithms. Website: https://naveenraman.com/