SC-652 Statistical Learning and sequential prediction

Instructor: Avishek Ghosh, Assistant Professor, SysCon and C-MInDS, IIT Bombay

Contact: Room 201, SysCon; Email: avishek_ghosh@iitb.ac.in

Timing: Monday/Thursday 2:00 - 3:25 pm

Classroom: LT 201, Lecture Hall Complex-2

TA: Bhavini Jeloka (Email: bhavini.jeloka@gmail.com)

Scribe Format: See here

Scribe Schedule : See here

About the course: This course will cover fundamental topics in online (or sequential) learning. The course is roughly divided into 4 modules. In the first module, we will look into prediction problems with experts, and analyze algorithms like weighted majority and exponential weights (Hedge). In the second module, we will take up the topics of convex games being played with the environment, and obtain theoretically convergent strategies. Moreover, in the third module, we assume that the environment is stochastic, and study the framework of Multi-Armed Bandits. Finally, in module 4, we drive into Reinforcement Learning (RL). Here, we aim to obtain sharp convergence bounds (asymptotic and non-asymptotic given time) to some of the standard RL algorithms like TD learning, Q learning.

Grading: 2 HWs (20%), 1 mid term (25%), Final (40%), Scribe (10%) and Class participation (5%)

References: We won't follow any standard textbook. The following are the references for the modules (pdfs of all the following are available online):

Module 1: Cesa-Bianchi, Nicolo, and Gabor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.
Module 2: Hazan, Elad. ”Introduction to online convex optimization.” Foundations and Trends in Optimization 2.3-4 (2016): 157-325.
Module 3: Lattimore, Tor, and Csaba Szepesvari. Bandit algorithms. Cambridge University Press, 2020.
Module 4: Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.

Other resources: Apart from these, the course material will be taken from lecture notes of Prof. Arya Mazumdar's course on Applied Information Theory (UC San Diego), Prof. Aditya Gopalan's course on Online Learning and Prediction (IISc Bangalore), Prof. Kannan Ramchandran's course on Information Theory (UC Berkeley), Prof. Peter Bartlett's course on Statistical Learning Theory (UC Berkeley) among others.

Lectures:

Lecture 1 Intro to online learning, course logistics, Warm-up--1 bit prediction with experts
Lecture 2 Math review, Probability, Concentration Inequalities
Lecture 3 Math review contd., Convex functions, Back to the 1-bit prediction problem, Weighted Majority Algorithm
Lecture 4 Regret of Weighted Majority Algorithm, Exponential Weights
Lecture 5 Analysis of Exponential Weights, Regret Bounds
Lecture 6 Performance of deterministic algorithms, Randomization, Randomized Exponential Weights
Lecture 7 High Probability Regret, Minimax lower bound on Regret, Optimality of Exponential Weights
Lecture 8 Introduction to Online Convex Optimization, Greedy Approach, Follow The Leader (FTL), FTL for Quadratic Minimization
Lecture 9 Linear Regret of FTL for linear losses, Follow the Regularized Leader (FTRL), Regret bounds for FTRL
Lecture 10 Sub-linear Regret of FTRL with linear losses, Online Gradient Descent (OGD) for convex losses, Regret Bounds of OGD
Lecture 11 OGD for strongly convex functions, Online Mirror Descent from OGD, Dual Space view
Lecture 12 Dual Spaces, Fenchel Young inequality, Duality, Online Mirror Descent with linear loss
Lecture 13 Online Learning with partial/Bandit feedback, Sphere Gradient Estimator, FKM algorithm, Regret Bounds
Lecture 14 Introduction to Multi-Armed-Bandits (MAB), Adversarial Bandits, EXP3 algorithm, Regret of EXP3
Lecture 15 Concluding the Regret Proof of EXP3, Variance control and high probability regret bound, EXP3.P algorithm
Lecture 16 Adversarial Bandits contd., Regret bound for EXP3.P algorithm
Lecture 17 EXP3.P regret, Stochastic MAB, Formulation, Pseudo Regret
Lecture 18 Algorithms, Greedy algorithm, linear regret, Epsilon first algorithm, Regret analysis
Lecture 19 Regret of Epsilon First Algorithm, Upper Confidence Bound (UCB) algorithm, Regret analysis
Lecture 20 Concluding the regret proof of UCB, Pure Exploration (Best arm identification) algorithms, PAC framework, Naive Algorithm
Lecture 21 Pure Exploration Algorithms continued, Median Elimination, Sample complexity
Lecture 22 Sample complexity of Median Elimination, Bayesian MAB, Thompson Sampling
Lecture 23 Thompson Sampling Continued, Minimax Lower Bound on Bandit Algorithms, Connection to finding a biased coin in a set of N coins
Lecture 24 Proof of the minimax lower bound, connection to hypothesis testing
Lecture 25 Introduction to Reinforcement Learning, MDPs, Policy, Value Function, Q function, Bellman Equations
Lecture 26 Planning algorithms: Value Iteration, Policy iteration, Guarantees, Learning algorithms: Temporal Difference (TD) learning, Q learning

General Guidelines for Homeworks:

Homeworks should be submitted in class. Students are encouraged to discuss among themselves while solving the HW problems. However, they are required to write the solution on their own. Near-identical submissions will not be awarded any score.

Homeworks: