Online Decision Making: The utility of Fluid Heuristics & Optimism in the face of uncertainty
Topics to be covered this six week course
An introduction to Markov Decision Processes
Dynamic Resource-Constrained Reward Collection Problems
Online Allocation problems
Multi-Armed Bandits & Bandits with Knapsacks
Contextual Bandits & Contextual Bandits with Knapsacks
Operationalising the principle of optimism in broader Reinforcement Learning problems
References:
Monograph by Alekh Agarwal, Nan Jiang, Sham Kakade, Wen Sun
Journal article by Santiago R. Balseiro, Omar Besbes, Dana Pizarro
Operations Research (In Press)
Journal article by Santiago R. Balseiro, Haihao Lu, Vahab Mirrokni
Operations Research 71(1), pp. 101-119
Monograph by Dylan J. Foster and Alexander Rakhlin
Lecture slides
Week 2: Dynamic Resource-Constrained Reward-Collection Problems: Introduction, Examples, Fluid approximation, and the CE heuristic (slides)
Week 3: The fluid approximation and the CE heuristic in problems (slides)
Please note: These slides are just used as teaching aid for this class. They are not meant to be reproduced