Lectures

Lecture 1 (4/3/2018): Introduction & Course Overview

Lecture 2 (4/5/2018): Online Learning & Bandits

Lecture 3 (4/10/2018): Shallow Reinforcement Learning

Presented by Hoang
Slides
Topic: Exact algorithms, approximation with least-square methods: LSTD (possibly LSPI)
Introduction to temporal difference learning (summary notes by Florian Kunz)
Least-squares temporal difference learning (Boyan, Machine Learning 2002)
Recommended: Least-squares policy iteration (by Lagoudakis & Parr, JMLR 2003) , Near Optimal RL in Polynomial Time (Kearns & Singh, Machine Learning 2002)

Lecture 4 (4/12/2018): Imitation Learning

Presented by Sara Beery, Natalie Bernat, Aadyot Bhatnagar, Eric Zhan (mentor Jialin)
Slides
An Invitation to Imitation, by Drew Bagnell
A Game-Theoretic Approach to Apprenticeship Learning, by Umar Syed and Robert Schapire. NIPS 2008.

Lecture 5 (4/17/2018): Q-Learning & SARSA

Presented by Ola Kalisz, Qingzhuo Aw Young, Eric Zhao, Navid Azizan Ruhi (mentor Hoang)
Slides
A technical note on Q-learning, by Christopher Watkins and Peter Dayan.
Excerpts from Sutton & Barto book (Chapter 6)
Deep Q-learning (Mnih et al., NIPS 2013)
(optional) Double Q-learning, Prioritized Experience Replay, Dueling Network Architecture

Lecture 6 (4/19/2018): Policy Gradient & Actor Critic

Lecture 7 (4/24/2018): Inverse RL

Presented by Bianca Yang, Ayya Alieva, Timothy Liu, Vaishnavi Shrivatava (mentor Jialin)
Slides
Maximum entropy inverse reinforcement learning (Ziebart et al., AAAI 2008)
Guided cost learning: deep inverse optimal control (Finn et al., ICML 2016)
(optional) Apprenticeship learning via inverse reinforcement learning (Abbeel & Ng, ICML 2004)

Lecture 8 (4/26/2018): DAgger (Imitation Learning)

Presented by Danny Park, Ashwin Hari, Nishanth Bhaskar, Sunash Sharma, Ayan Bandyopadhyay (mentor Jialin)
Slides
DAgger: Interactive imitation learning as online learning (Ross et al., AISTATS 2011)
AggreVate: Connecting imitation and reinforcement learning (Sun et al., ICML 2017)

Lecture 9 (5/1/2018): Learning LQRs

Presented by Andrew Taylor, Victor Dorobantu, Spencer Strumwasser, Ethan Lo, Bobo Hu (mentor Hoang)
Slides
On the sample complexity of the Linear Quadratic Regulator (Dean et al., arxiv 2017)
PAC adaptive control of linear systems (Fiechter, COLT 1997)
(optional) Least-squares temporal difference learning for LQR (Tu & Recht, arxiv 2017)

Lecture 10 (5/3/2018): Constrained Policy Improvement

Presented by Guanya Shi, Yan (Echo) Wu, David Kawashima, Steven Brotz, Andrew Wang (mentor Yisong)
Slides
Conservative policy iteration (Kakade & Langford, ICML 2002)
Constrained policy optimization (Achiam et al., ICML 2017)
(optional) Trust region policy optimization (Schulman et al., ICML 2015)

Lecture 11 (5/8/2018): Monte-Carlo Tree Search

Presented by Cathy Ma, Anant Desai, Joon Hee Lee, Nikhil Krishnan, Gahye Jeong (mentor Stephan)
Slides
Survey of MCTS (Browne et al., IEEE Transactions on Computational Intelligence and AI in Games 2012)
Deep Learning + Offline MCTS planning for Atari games (Guo et al., NIPS 2014)
(optional) Thinking fast and slow with deep learning and tree search (Anthony et al., NIPS 2017)

Lecture 12 (5/10/2018): Thompson Sampling & Extensions to RL

Presented by Charles Lien, Christopher Haack, Julia Reisler, Jonathan Willett, Rachael Morton (mentor Yisong)
Slides
Tutorial introduction to Thompson sampling (Russo et al., Foundations and Trends in Machine Learning, 2018)
- Shorter Version
Efficient Exploration through Bayesian Deep Q-Networks (Azizzadenesheli et al., arxiv 2018)
(optional) More efficient RL via posterior sampling (Osband et al., NIPS 2013)
(optional) Deep exploration via boostrapped DQN (Osband et al., NIPS 2016)

Lecture 13 (5/15/2018): Multi-Agent RL

Lecture 14 (5/17/2018): Safe RL

Lecture 15 (5/22/2018): Generative Adversarial Imitation Learning

Presented by Michael Li, Michelle Zhao, Michelle Park (mentor Jialin)
Slides
GAN + imitation learning (Ho & Ermon, NIPS 2016)
(optional) InfoGAIL (Li et al., NIPS 2017) and Multi-agent GAIL (Song et al., ICLR 2018 workshop)

Lecture 16 (5/24/2018): Model-Based + Model-Free RL

Lecture 17 (5/29/2018): Smooth Imitation Learning

Presented by Brent Cahill, Roshan Bal, Eli Pinkus, Eshan Govil (mentor Yisong)
Slides
Smooth imitation learning (Le et al., ICML 2016)
(optional) Search-based structured prediction (Daume et al., Machine Learning 2009)
(optional) Learning Online Smooth Predictors for Realtime Camera Planning using Recurrent Decision Trees, (Chen et al., CVPR 2016)

Lecture 18 (5/31/2018): Inverse Reward Design

Presented by Ben Stevens, Matt Hartley, Aiden Aceves, David Brown, Eduardo Beltrame (mentor Jialin)
Inverse reward design (Hadfield-Menell et al., NIPS 2017)
Reward design via online gradient ascent (Sorg et al., NIPS 2010)
(optional) Where do rewards come from? (Singh et al., CogSci 2009)

Lecture 19 (6/5/2018): Off-Policy RL

Lecture 20 (6/7/2018): Poster Session

Google Sites

Report abuse