Examples of machine learning problems, classification of learning models, learning from expert advice, the weighted majority + RWM algorithms + their analysis, discussion of lower bounds,
In lecture 2 we covered the Hedge algorithm, and how a naive application to many problems of interest is inefficient. Discussion of online routing and linear classification. Motivate mathematical optimization
Reading:
Chapter 1 in book 1
Scribe Notes:
Introduction to convex analysis and mathematical optimization, solution concepts for convex and non-convex optimization, gradient descent and its analysis for convex optimization. We described how to model combinatorial problems such as shortest path in graphs, as mathematical optimization.
In lecture 4 we covered the online portfolio selection problem, which motivated online convex optimization, then online gradient descent + analysis. We derived online shortest paths.
Reading:
Chapter 2,3 in book 1
Scribe Notes:
Strongly convex and smooth functions, logarithmic regret algorithms, exp-concave functions and the Online Newton Step algorithm.
Hardness of non-convex optimization: information-theoretic hardness for nonsmooth functions, NP-hardness of quadratics. Convergence to stationary points of GD and SGD.
The statistical learning model: intro
Reading:
Lecture notes on optimization (1)
Chapter 4 in book 1
Chapters 1-4 in book 3
Scribe Notes:
In lecture 7 we defined statistical learning, talked about uniform convergence for finite hypothesis classes, and gave an intro to generalization theory for infinite hypothesis classes. We discussed computational limitations of the statistical model and gave a few examples of hypothesis classes that can be learned statistically, but not computationally.
In lec 8 we talked about the online2batch reduction, and proved the theorem showing how regret in online learning gives a generalization error guarantee in the statistical learning setting.
Reading:
Chapter 9 in book 1
Book 3
Scribe Notes:
In lecture 9 we started with an example of a linear program and why this is a useful methodology in optimization. We then proceed to define linear programming an its applications, the duality theorem.
We continued to define zero sum games, general games, von-Neumann equilibirum, and the min-max theorem. We showed the equivalence between LP and solving zero sum games, and thus the equivalence between duality and the min-max theorem.
In lecture 10 we proved the min-max theorem, and thus LP duality. We proceeded to give a particularly simple and efficient algorithm for linear programming.
Scribe Notes:
In lecture 11 we define a model for learning with partial information: the multi-armed bandit problem. We describe a simple methodology for converting partial information into full information problems, and derive the simple-MAB algorithm. We then describe and analyze the EXP3 algorithm.
In lecture 12 we defined the bandit convex optimization problem, motivated by structured problems with limited information such as online shortest paths. We defined and analyzed the FKM algorithm. For tight regret bounds in bandit linear optimization, we defined the Scribble algorithm, but without analysis.
Scribe Notes:
We discussed and rigorously defined the control problem. We gave several examples. We discussed equilibria and the stability problem, and gave evidence is computionally intractable in general.
Scribe Notes:
NP hardness of stabilizability, defining MDP: Markov chains, stationary distributions, ergodic theorem, Markov Reward Processes, Markov Decision Processes, policies, Bellman equation.
Scribe Notes:
Wrapped up definitions of the learning in MDP model, reviewed the Bellman equation, defined value iteration algorithm and proved its convergence with a discount factor.
Scribe Notes:
Scribe Notes:
This lecture we defined the optimal control paradigm and the fundamental LQR problem. We saw the closed form solution, proved the correctness of the solution, existence of optimal linear policy and quadratic value function. We talked about infinite horizon LQR and the DARE equation that gives the solution for infinite horizon. At the end of class we talked about controllability, the Kalman matrix, and the sufficient and necessary condition for a LDS to be controllable.
We discussed optimal and robust control, and why they are insufficient to cope with robust environments, non-quadratic losses, and nonstochastic noise. The online control model and regret vs. policies. State-action and disturbance-action control policies. Finally we described the Gradient Perturbation Controller, its regret guarantees, and went over its analysis.
In this lecture we will discuss new directions in control and RL based on the paradigm of online control, the identification of unknown systems in the online model, changing environments, bandit control and more. We conclude with a "ask me anything" session, featuring guest researchers Xinyi Chen and Annie Marsden.