Open-Source Lectures

Results in Multi-armed Bandit literature

Proof sketch of the converse result in the Classical Multi-armed Bandit problem

Proof sketch of the Uniform Confidence Bound (UCB) algorithm

Proof sketch of the Thompson Sampling algorithm using Beta priors

Dataset for benchmarking Offline/Batch RL algorithms

Model-based Offline Reinforcement Learning algorithm

Model-based Offline Policy Optimization algorithm

Critic Regularized Regression

Fitted Value/Policy Iteration algorithm for Offline RL

Page updated

Report abuse