Theoretical Machine Learning

Lectures and reading

Files for scribe notes: available in canvas

Lecture 1+2 Introduction to learning theory, learning from expert advice and the online learning model

Examples of machine learning problems, classification of learning models, learning from expert advice, the weighted majority + RWM algorithms + their analysis, discussion of lower bounds,
In lecture 2 we covered the Hedge algorithm, and how a naive application to many problems of interest is inefficient. Discussion of online routing and linear classification. Motivate mathematical optimization

Reading:

Chapter 1 in book 1

Scribe Notes:

Lecture 3+4: Convex optimization, OCO and online gradient descent

Introduction to convex analysis and mathematical optimization, solution concepts for convex and non-convex optimization, gradient descent and its analysis for convex optimization. We described how to model combinatorial problems such as shortest path in graphs, as mathematical optimization.
In lecture 4 we covered the online portfolio selection problem, which motivated online convex optimization, then online gradient descent + analysis. We derived online shortest paths.

Reading:

Chapter 2,3 in book 1

Scribe Notes:

Lecture 5+6: Strongly convex OCO, non-convex optimization, statistical learning theory

Strongly convex and smooth functions, logarithmic regret algorithms, exp-concave functions and the Online Newton Step algorithm.

Hardness of non-convex optimization: information-theoretic hardness for nonsmooth functions, NP-hardness of quadratics. Convergence to stationary points of GD and SGD.

The statistical learning model: intro

Reading:

Lecture notes on optimization (1)
Chapter 4 in book 1
Chapters 1-4 in book 3

Scribe Notes:

Lecture 7+8: Statistical learning theory, online 2 batch

In lecture 7 we defined statistical learning, talked about uniform convergence for finite hypothesis classes, and gave an intro to generalization theory for infinite hypothesis classes. We discussed computational limitations of the statistical model and gave a few examples of hypothesis classes that can be learned statistically, but not computationally.

In lec 8 we talked about the online2batch reduction, and proved the theorem showing how regret in online learning gives a generalization error guarantee in the statistical learning setting.

Reading:

Chapter 9 in book 1
Book 3

Scribe Notes:

Lecture 7

Lecture 9+10: Game theory, duality, and linear programming

In lecture 9 we started with an example of a linear program and why this is a useful methodology in optimization. We then proceed to define linear programming an its applications, the duality theorem.
We continued to define zero sum games, general games, von-Neumann equilibirum, and the min-max theorem. We showed the equivalence between LP and solving zero sum games, and thus the equivalence between duality and the min-max theorem.

In lecture 10 we proved the min-max theorem, and thus LP duality. We proceeded to give a particularly simple and efficient algorithm for linear programming.

Scribe Notes:

Lecture 11+12: Learning with partial information

In lecture 11 we define a model for learning with partial information: the multi-armed bandit problem. We describe a simple methodology for converting partial information into full information problems, and derive the simple-MAB algorithm. We then describe and analyze the EXP3 algorithm.

In lecture 12 we defined the bandit convex optimization problem, motivated by structured problems with limited information such as online shortest paths. We defined and analyzed the FKM algorithm. For tight regret bounds in bandit linear optimization, we defined the Scribble algorithm, but without analysis.

Scribe Notes:

Lecture 13+14+15: Review, projection-free optimization and learning

Lecture 16: Introduction to Control

We discussed and rigorously defined the control problem. We gave several examples. We discussed equilibria and the stability problem, and gave evidence is computionally intractable in general.

Scribe Notes:

Lecture 16

Lecture 17: Computational hardness of Control, MDP and RL

NP hardness of stabilizability, defining MDP: Markov chains, stationary distributions, ergodic theorem, Markov Reward Processes, Markov Decision Processes, policies, Bellman equation.

Scribe Notes:

Lecture 17

Lecture 18: More RL, value iteration, start linear dynamical systems

Wrapped up definitions of the learning in MDP model, reviewed the Bellman equation, defined value iteration algorithm and proved its convergence with a discount factor.

Scribe Notes:

Lecture 18

Lecture 19-20: Exercises on RL/intro to control/dynamical systems

Scribe Notes:

Lecture 19

Lecture 21: Optimal Control and LQR

This lecture we defined the optimal control paradigm and the fundamental LQR problem. We saw the closed form solution, proved the correctness of the solution, existence of optimal linear policy and quadratic value function. We talked about infinite horizon LQR and the DARE equation that gives the solution for infinite horizon. At the end of class we talked about controllability, the Kalman matrix, and the sufficient and necessary condition for a LDS to be controllable.

Lecture22: Online Nonstochastic Control

We discussed optimal and robust control, and why they are insufficient to cope with robust environments, non-quadratic losses, and nonstochastic noise. The online control model and regret vs. policies. State-action and disturbance-action control policies. Finally we described the Gradient Perturbation Controller, its regret guarantees, and went over its analysis.

Lecture23: The Emerging Field of Online Control

In this lecture we will discuss new directions in control and RL based on the paradigm of online control, the identification of unknown systems in the online model, changing environments, bandit control and more. We conclude with a "ask me anything" session, featuring guest researchers Xinyi Chen and Annie Marsden.

Page updated

Google Sites

Report abuse