Instructor: Pratik Jawanpuria and Pranay Sharma
TA: Moparthy Venkata Subrahmanya Sri Harsha
Time: Mondays and Thursdays, 5.30-7.00 pm
Room: LT-201
Office Hours: after every class, or email us
Course Description
This course introduces an advanced set of topics at the intersection of optimization, sampling, and generative modeling, which are foundational to modern machine learning and artificial intelligence. It covers essential techniques such as mirror descent, natural gradient methods, and zeroth-order optimization, which are critical for scalable and efficient learning algorithms. The inclusion of optimal transport and sampling methods like Langevin dynamics and Hamiltonian Monte Carlo reflects their growing importance in probabilistic modeling and inference.
Pre-requisites
At least one graduate course in both optimization and probability-statistics.
Tentative Topics (evolving)
Mirror Descent (MD)
(Generalized) Bregman divergence; mirror descent; variants - Dual Averaging, Mirror Prox
Natural Gradient Method - relation to MD; applications to RL - natural actor-critic, TRPO, etc.
Optimal Transport
Generative AI and Inverse problems
Introduction to diffusion models; DDPMs; DDIMs; inverse problems
References
[B17] Amir Beck. "First-order Methods in Optimization." SIAM (2017).
[B15] Sébastien Bubeck. "Convex optimization: Algorithms and complexity." Foundations and Trends in Machine Learning 8.3-4 (2015).
[O19] Francesco Orabona. "A modern introduction to online learning." arXiv preprint arXiv:1912.13213 (2019).
[M23] Kevin Murphy, “Probabilistic Machine Learning: Advanced Topics.” MIT Press (2023).
[PC19] Peyre and Cuturi, “Computational Optimal Transport.” Now publishers (2019).
[LSK+25] Lai, Song, Kim, Mitsufuji, and Ermon. "The principles of diffusion models." arXiv preprint arXiv:2510.21890 (2025).
[C24] Stanley Chan. "Tutorial on diffusion models for imaging and vision." Foundations and Trends® in Computer Graphics and Vision 16.4: 322-471 (2024).
Some Related Courses
to be updated
Lecture Notes (will be posted here)
Lecture Notes
I: Mirror Descent (taught by Pranay)
Introduction to DS608: projected gradient descent; mirror descent; Bregman divergence (handwritten)
Subgradients - subgradient calculus; conjugate functions; mirror interpretation of mirror descent (handwritten/scribed)
Convergence of mirror descent - improvement over subgradient method (handwritten/scribed)
Convergence of (stochastic) mirror descent contd. (handwritten/scribed)
Saddle-point problem - saddle point mirror descent (SP-MD) and saddle point mirror-prox (handwritten/scribed)
Natural Gradient Method - multiplicative weights update; applications (handwritten/scribed)
Gradient Flow and Mirror Flow (handwritten/scribed)
II: Optimal Transport (taught by Prof. Pratik Jawanpuria)
Intro to Optimal Transport (handwritten/scribed)
Computational OT - Monge problem; Kantorovich Relaxation; Wasserstein distance and its properties (handwritten/scribed)
Primal and Dual Kantorovich problems; Brenier's theorem (handwritten/scribed)
Fundamental theorem of OT; Entropic-regularized OT (handwritten/scribed)
Wasserstein Barycenter; Unsupervised domain adaptation (handwritten/scribed)
Federated Learning using Barycenters (handwritten/scribed)
III: Math of Diffusion Models (taught by Pranay)
Introduction to Generative Modeling - variational autoencoders; evidence lower bound (ELBO) (handwritten/scribed)
VAEs and DDPMs - ELBO derivation; DDPM forward and backward process (handwritten)
DDPMs (contd.) - ELBO derivation; conditioning trick (handwritten/scribed)
Score-matching in Diffusion Models - Langevin sampling; denoising score matching; Tweedie's formula; Annealed Langevin Dynamics for sampling (handwritten)
Score SDEs - Brownian motion; reverse-time SDE; (handwritten/scribed)
Score SDE and Probability Flow ODE - Fokker-Planck Equation (handwritten/scribed)
IV: Student Presentations (45-60 min each)
Online Mirror Descent (Aviral)
Trust Region Policy Optimization (Anshu)
Generative Modeling using the Sliced Wasserstein Distance (Parth)
Fundamental Benefit of Alternating Updates in Minimax Optimization (Tejas)
Mirror Descent Maximizes Generalized Margin (Angad)
Denoising Diffusion Implicit Models (DDIMs) (Malay)
Structured Denoising Diffusion Models in Discrete State-Spaces (Aditya)
Policy Mirror Descent for Reinforcement Learning (Siva)
On Graph Matching Using Gromov-Wasserstein Distance (Niral)
Note: since it was a small class, there were not enough students to scribe all the lectures. And I was too lazy to do it myself.