IFT 6135 - Lectures

IFT 6135

IFT 6135 - Representation Learning

Course Lectures

18- Representation Learning, AI Safety and Existential Risk- David Krueger (27/03/2019)

Slides:

AI Safety and XRisk

17- Learning with non-IID data - Tegan Maharaj (27/03/2019)

In the practice of learning from data, we make many assumptions - some fundamental to the theory of ML, some practical, and some implicit. This lecture attempts to identify some of these assumptions, and ways we can deal with breaking them. It covers the IID assumptions, systematic generalization (touching related ideas in causality), distributional shift, online/continual/open-set learning, and mentions some results in statistical learning theory and empirical investigations of deep network learning behaviour as well as in FATES (Fairness, Accountability, Transparency, Ethics, and Safety).

Slides:

Learning from non-IID data

References:

Dataset shift
- A Unifying View on Dataset Shift in Classification. Pattern Recognition, 2012. Moreno-Torres, Jose G. and Raeder, Troy and Alaiz-Rodriguez, Rocio and Chawla, Nitesh V. and Herrera, Francisco. https://www.sciencedirect.com/science/article/abs/pii/S0031320311002901
- Dataset Shift in Machine Learning. MIT Press, 2009. Quionero-Candela, Joaquin and Sugiyama, Masashi and Schwaighofer, Anton and Lawrence, Neil. http://www.acad.bg/ebook/ml/The.MIT.Press.Dataset.Shift.in.Machine.Learning.Feb.2009.eBook-DDU.pdf
- Concrete Problems in AI Safety. OpenAI 2016. Dario Amodei*, Chris Olah*, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané. https://arxiv.org/pdf/1606.06565.pdf
Online / continual learning
- Large Scale Online Learning. 2004. Leon Bottou and Yann Lecun. http://yann.lecun.com/exdb/publis/pdf/bottou-lecun-04b.pdf
- Progressive neural networks. 2016. Andrei A Rusu*, Neil C Rabinowitz*, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell. https://arxiv.org/pdf/1606.04671.pdf
- Overcoming catastrophic forgetting in neural networks. 2017. James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell. https://arxiv.org/abs/1612.00796
Empirical investigations of NN behaviour
- Explaining and Harnessing Adversarial Examples. ICLR 2015. Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy https://arxiv.org/pdf/1412.6572.pdf
- Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. CVPR 2015. Anh Nguyen, Jason Yosinski, Jeff Clune. https://arxiv.org/pdf/1412.1897.pdf
- Understanding Deep Learning Requires Rethinking Generalization. ICLR 2017. Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals. https://arxiv.org/abs/1611.03530
- A Closer Look at Memorization in Deep Networks. ICML 2017. Devansh Arpit*, Stanisław Jastrzębski*, Nicolas Ballas*, David Krueger*, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien. https://arxiv.org/abs/1706.05394
- An Empirical Study of Example Forgetting during Deep Neural Network Learning. ICLR 2019. Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon. https://arxiv.org/abs/1812.05159
- Do ImageNet Classifiers Generalize to ImageNet? 2019. Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, Vaishaal Shankar. https://arxiv.org/abs/1902.10811
Systematic Generalization & Causality
- Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. ICML 2018. Brendan Lake and Marco Baroni. https://arxiv.org/abs/1711.00350
- Systematic Generalization: What Is Required and Can It Be Learned? ICLR 2019. Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville. https://openreview.net/pdf?id=HkezXnA9YX
- On Causal and Anticausal Learning. ICML 2012. Bernhard Scholkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, JOris Mooij. https://icml.cc/2012/papers/625.pdf
- Causal inference in statistics: An Overview. Statistics Surveys 2009. Judea Pearl. https://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf
FATES (Fairness, Accountability, Transparency, Ethics, Safety) Research
- Towards A Rigorous Science of Interpretable Machine Learning. 2017. Finale Doshi-Velez* and Been Kim*. https://arxiv.org/pdf/1702.08608.pdf
- Tutorial on Fairness in Machine Learning. NIPS 2017. Solon Bacrocas and Moritz Hardt. https://mrtz.org/nips17
- The Ethics of Algorithms: Mapping the debate. Big Data and Society 2016. Brent Daniel Mittelstadt, Patrick Allo, Mariarosaria Taddeo, Sandra Wachter, and Luciano Floridi. https://journals.sagepub.com/doi/pdf/10.1177/2053951716679679
- The Ethics of Artificial Intelligence. MIRI 2011. Nick Bostrom and Eliezer Yudkowsky. https://intelligence.org/files/EthicsofAI.pdf
- Concrete Problems in AI Safety. OpenAI 2016. Dario Amodei*, Chris Olah*, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané. https://arxiv.org/pdf/1606.06565.pdf
- https://www.lawfareblog.com/thinking-about-risks-ai-accidents-misuse-and-structure

16 – Meta-Learning - Hugo Larochelle (27/03/2019)

A lot of the recent progress on many AI tasks was enable in part by the availability of large quantities of labeled data. Yet, humans are able to learn concepts from as little as a handful of examples. Meta-learning is a very promising framework for addressing the problem of generalizing from small amounts of data, known as few-shot learning. In meta-learning, our model is itself a learning algorithm: it takes as input a training set and outputs a classifier. For few-shot learning, it is (meta-)trained directly to produce classifiers with good generalization performance for problems with very little labeled data. In this talk, I'll present an overview of the recent research that has made exciting progress on this topic (including my own) and, if time permits, will discuss the challenges as well as research opportunities that remain.

Slides:Meta-Learning slides

15 – GANs (25/03/2019)

In this lecture, we will discuss Generative Adversarial Networks (GANs). GANs are a recent and very popular generative model paradigm. We will discuss the GAN formalism, some theory and practical considerations.

Slides:

GAN slides

Reference: (* = you are responsible for this material)

*Sections 20.10.4 of the Deep Learning textbook.
*Generative Adversarial Networks by Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio (NIPS 2014).
*f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization by Sebastian Nowozin, Botond Cseke and Ryota Tomioka (NIPS 2016).
NIPS 2016 Tutorial: Generative Adversarial Networks by Ian Goodfellow, arXiv:1701.00160v1, 2016
Adversarially Learned Inference by Vincent Dumoulin , Ishmael Belghazi , Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky and Aaron Courville (ICLR 2017).

14 – Normalizing Flows (20/03/2019)

In this lecture, we will finish the inference suboptimality part of the VAE lecture, have a crash course on Normalizing Flows, and see how they can be used (1) to reduce the approximation gap of VAEs by using a more flexible family of variational distributions, and (2) as a generative model by inverting the transformation of the data distribution into a prior distribution.

Slides:

Normalizing Flows (Tractable Change of Density, Expressivity and Beyond)

Reference: (* = you are responsible for this material)

*Chapter 20.10.2 of the Deep Learning textbook.
*Sections 3.1, and 4 of Variational Inference with Normalizing Flows by Rezende and Mohamed, 2015
NFs as generative models: see Section 3.1 of Density Estimation using RealNVP by Laurent Dinh, 2016
For a unifying view, see Section 2 of Neural Autoregressive Flows by Huang et al., 2018
See the blog posts by Eric Jang: part 1, and part 2

13 – Variational Autoencoders (18/03/2019)

In this lecture, Chin-Wei will talk about a family of latent variable models known as the Variational Autoencoders (VAE). We’ll see how a deep latent gaussian model can be seen as an autoencoder via amortized variational inference, and how such an autoencoder can be used as a generative model. At the end, we’ll take a look at variants of VAE and different ways to improve inference.

Slides:

Variational Autoencoders (Reparameterization, Amortization, and Inference Suboptimality)

Reference: (* = you are responsible for this material)

*Chapter 20.10.3 of the Deep Learning textbook.
Variational Inference, lecture note by David Blei. Section 1-6.
Auto-Encoding Variational Bayes by Diederik Kingma (ICLR 2014) or Stochastic Backpropagation and Approximate Inference in Deep Generative Models by Danilo Rezende (ICML 2014).
Importance Weighted Autoencoders by Yuri Burda (ICLR 2016)
Inference Suboptimality in Variational Autoencoders by Chris Cremer (ICML 2018)
Blog post Variational Autoencoder Explained by Goker Erdogan
Blog post Families of Generative Models by Andre Cianflone

12 – Autoencoders and Autoregressive Generative Models (13/03/2019)

In this lecture we will take a closer look at a form of neural network known as an Autoencoder. We will also begin our look at generative models with Autoregressive Models.

Slides:

Reference: (* = you are responsible for this material)

*Chapter 13-14 of the Deep Learning textbook.
*Sections 20.10.5-20.10.10 of the Deep Learning textbook.
The Neural Autoregressive Distribution Estimator by Hugo Larochelle and Iain Murray (AISTAT2011)
MADE: Masked Autoencoder for Distribution Estimation by Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle (ICML2015).
Pixel Recurrent Neural Networks by Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu (ICML2016)
*Conditional Image Generation with PixelCNN Decoders by Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu (NIPS2016)

11 – Self-Attention and Transformer (11/03/2019)

In this talk Arian Hosseini will look at self attention and the transformer model. We will see how they work, dig deep into them, see analysis and performances, and their applications (in language, vision and speech).

Slides:

Self-Attention and Transformer by A. Hosseini.

Reference:

10 – Attention (27/02/2019)

In this lecture prepared by Dzmitry (Dima) Bahdanau, I will discuss attention in neural networks.

Slides:

Attention Models in Deep Learning by D. Bahdanau.

Reference: (* = you are responsible for this material)

*Sections 12.4.5 of the Deep Learning textbook.
*Attention and Augmented Recurrent Neural Networks, a blog post by Chris Olah and Shan Carter, Sept. 2016.
Part I: Attention models for applications with variable-length inputs and outputs
- *Neural Machine Translation by Jointly Learning to Align and Translate, D. Bahdanau, K. Cho, Y. Bengio, ICLR 2015
- *Section 5 of Generating Sequence with Recurrent Neural Networks, A. Graves, ArXiV
- Connectionist Temporal Classification: Labellling Unsegmented Sequence Data with Recurrent Neural Networks, A. Graves, S. Fernandez, F. Gomez, J. Schmidhuber, ICML 2006
- Sequence ModelingWith CTC, Awni Hannun, Distill, 2017.
Part II: Visual attention models
- Recurrent Models of Visual Attention, V. Mnih, N. Heess, A. Graves, K. Kavukcuoglu, NIPS 2014
- DRAW: a Recurrent Neural Network for Image Generation, K. Gregor, I. Danihelka, A. Graves, DJ Rezende, D. Wierstra, ICML 2015

09 – Normalization Methods (25/02/2019)

Devansh Arpit will introduce a number of normalization techniques that have become very popular in training deep neural networks.

Slides and Notes:

Normalization Techniques (slides)
Lecture Notes

Reference: (* = you are responsible for this material)

*Chapter 8 of the Deep Learning textbook.

08 – Optimization (20/02/2019)

In this lecture, we will discussion both popular and practical first-order optimization methods and - if time permits - discuss some approximate second-order methods and their interpretation.

Slides:

Reference: (* = you are responsible for this material)

*Chapter 8 of the Deep Learning textbook.
Why Momentum Really Works. Gabriel Goh, Distill 2017.

07 – Regularization (11/02/2019)

In these lectures, we will have a rather detailed discussion of regularization methods and their interpretation.

Slides:

Reference: (* = you are responsible for this material)

*Chapter 7 of the Deep Learning textbook.
Understanding deep learning requires rethinking generalization (ICLR 2017) by Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals

06 – Sequential Models (04/02/2019)

In this lecture we introduce Recurrent Neural Networks and related models.

Lecture 08 RNNs (slides derived from Hugo Larochelle)

Reference: (* = you are responsible for this material)

*Chapter 10 of the Deep Learning textbook (sections. 10.1-10.11, we will cover the material in 10.12 later).
Blog post on Understanding LSTM Networks by Chris Olah.

05 – ConvNets (28/01/2019)

Today we conclude our discussion of convolutional neural networks.

Lecture 05 CNNs II (Slides are from Hiroshi Kuwajima’s Memo on Backpropagation in Convolutional Neural Networks.)

Reference: (* = you are responsible for this material)

*Chapter 9 of the Deep Learning textbook (continued from last time), Sections 9.10 and 9.11 are optional.
Slides include Hiroshi Kuwajima’s Memo on Backpropagation in Convolutional Neural Networks.
Theano guide and paper on convolution arithmetic by Vincent Dumoulin and Francesco Visin.
WaveNet Blog presenting dilated convolutions animation and samples.
Blog on Deconvolution and Checkerboard Artifacts by Augustus Odena, Vincent Dumoulin and Chris Olah.

PyTorch Tutorial (21/01/2019-23/01/2019)

In these lectures we will have a PyTorch Tutorial and a question answering session.

The tutorial will cover

an introduction to the PyTorch Tensor library (torch.Tensor)
how automatic differentiation works with PyTorch (torch.autograd)
an introduction to the PyTorch neural network library (torch.nn)
how to design your own Module class (torch.nn.Module)
and an example building and training a model on MNIST using the tools above

04 – ConvNets (16/01/2019)

In this lecture we finish up our discussion of training neural networks and we introduce Convolutional Neural Networks.

Lecture 04 CNNs I (some slides are modified from Hugo Larochelle’s course notes)

Reference: (* = you are responsible for all of this material)

*Chapter 9 of the Deep Learning textbook, Sections 9.10 and 9.11 are optional.
Andrej Karpathy’s excellent tutorial on CNNs.
Theano guide and paper on convolution arithmetic by Vincent Dumoulin and Francesco Visin.
WaveNet Blog presenting dilated convolutions animation and samples.
Blog on Deconvolution and Checkerboard Artifacts by Augustus Odena, Vincent Dumoulin and Chris Olah.

03 – Training Neural Nets (14/01/2019)

In this lecture we continue with our introduction to neural networks. Specifically we will discuss how to train neural networks: i.e. the Backpropagation Algorithm

Lecture 03 training NNs (slides modified from Hugo Larochelle’s course notes)

Reference: (you are responsible for all of this material)

Chapter 6 of the Deep Learning textbook (by Ian Goodfellow, Yoshua Bengio and Aaron Courville).

02 – Intro to Neural Nets (09/01/2019)

In this lecture we finish our overview of Machine Learning and begin our detailed introduction to Neural Networks.

Lecture 02 artificial neurons (slides from Hugo Larochelle’s course notes)

Reference: (you are responsible for all of this material)

Chapter 6 of the Deep Learning textbook (by Ian Goodfellow, Yoshua Bengio and Aaron Courville).

01 – Introduction & Review (07/01/2019)

The first class is January 7th, 2019. We discuss the plan for the course and the pedagogical method chosen. We also briefly review some foundational material, covering linear algebra, calculus, and the basics of machine learning.

Lecture 01 slides (slides built on Hugo Larochelle’s slides)

Reference:

Chapters 1-5 of the Deep Learning textbook (by Ian Goodfellow, Yoshua Bengio and Aaron Courville).

Report abuse