The Deep Learning Summer School (2016 edition) features these invited speakers:

Machine Learning
We provide a general introduction to machine learning, aimed to put all participants on the same page in terms of definitions and basic background. After a brief overview of different machine learning problems, we discuss linear regression, its objective function and closedform solution. We discuss the biasvariance tradeoff and the issue of overfitting (and the proper use of crossvalidation to measure performance objectively). We discuss the probabilistic view of the sumsquared error as maximizing likelihood under specific assumptions on the data generation process, and present L2 and L1 regularization methods as priors from a Bayesian perspective. We briefly discuss Bayesian methodology for learning. Finally, we present logistic regression, the crossentropy optimization criterion and its solution through first and secondorder methods.
The slides for this lecture are available here. The video of this lecture is available here. The video of Doina's talk on "Advanced Topics in RL" is available here. 

Neural Networks
In this lecture, I will cover the basic concepts behind feedforward neural networks. The talk will be split into 2 parts. In the first part, I'll cover forward propagation and backpropagation in neural networks. Specifically, I'll discuss the parameterization of feedforward nets, the most common types of units, the capacity of neural networks and how to compute the gradients of the training loss for classification with neural networks. In the second part, I'll discuss the final components necessary to train neural networks by gradient descent and then discuss the more recent ideas that are now commonly used for training deep neural networks. I will thus present different variants of gradient descent algorithms, dropout, batch normalization and unsupervised pretraining.
The slides for this lecture are available here. The video of this lecture is available here.  
Introduction to Theano (Theano I & Practical Session)
The slides for this lecture are available here. The video of this lecture is available here. 
 Convolutional Neural Networks and Computer Vision
This talk will review Convolutional Neural Network models and the tremendous impact they have made on Computer Vision problems in the last few years.
The slides for this lecture are available here. The video of this lecture is available here. 

Learning to See
It is an exciting time for computer vision. With the success of new computational architectures for visual processing, such as deep neural networks (e.g., convNets) and access to image databases with millions of labeled examples (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly. Computer vision is now present among many commercial products, such as digital cameras, web applications, security applications, etc.
The performances achieved by convNets are remarkable and constitute the state of the art on many recognition tasks. But why it works so well? what is the nature of the internal representation learned by the network? I will show that the internal representation can be interpretable. In particular, object detectors emerge in a scene classification task. Then, I will show that an ambient audio signal can be used as a supervisory signal for learning visual representations. We do this by taking advantage of the fact that vision and hearing often tell us about similar structures in the world, such as when we see an object and simultaneously hear it make a sound. We train a convNet to predict ambient sound from video frames, and we show that, through this process, the model learns a visual representation that conveys significant information about objects and scenes.
The video of this lecture is available here. 
 Introduction to Torch (Torch I & Practical Session)
Torch is an open platform for scientific computing in the Lua language, with a focus on machine learning, in particular deep learning. Torch is distinguished from other array libraries by having firstclass support for GPU computation, and a clear, interactive and imperative style. Further, through the "NN" library, Torch has broad support for building and training neural networks by composing primitive blocks or layers together in compute graphs. Torch, although benefitting from extensive industry support, is a community owned and community developed ecosystem.
All neural net libraries, including Torch NN, rely on automatic differentiation (AD) to manage the computation of gradients of complex compositions of functions. I will also present some general background on automatic differentiation (AD), which is the fundamental abstraction of gradientbased optimization, and demonstrate Twitter's flexible implementation of AD in the library torchautograd. The slides for this lecture are available here. The video of this lecture is available here. 

Recurrent Neural Networks
This lecture will cover recurrent neural networks, the key ingredient in the deep learning toolbox for handling sequential computation and modelling sequences. It will start by explaining how gradients can be computed (by considering the timeunfolded graph) and how different architectures can be designed to summarize a sequence, generate a sequence by ancestral sampling in a fullyobserved directed model, or learn to map a vector to a sequence, a sequence to a sequence (of the same or different length) or a sequence to a vector. The issue of longterm dependencies, why it arises, and what has been proposed to alleviate it will be core subject of the discussion in this lecture. This includes changes in the architecture and initialization, as well as how to properly characterize the architecture in terms of recurrent or feedforward depth and its ability to create shortcuts or fast propagation of gradients in the unfolded graph. Open questions regarding the limitations of training by maximum likelihood (teacher forcing) and ideas towards towards making learning online (not requiring backprop through time) will also be discussed.
The slides for this lecture are available here. The video of this lecture is available here. 

Reasoning, Attention and Memory
The machine learning community has had great success in the last decades at solving basic prediction tasks such as text classification, image annotation and speech recognition. However, solutions to deeper reasoning tasks have remained elusive. A key component towards achieving deeper reasoning is the use of long term dependencies as well as short term context during inference. Until recently, most existing machine learning models have lacked an easy way to read and write to part of a (potentially very large) longterm memory component, and to combine this seamlessly with inference. To combine memory with reasoning, a model must learn how to access it, i.e. to perform *attention* over its memory.
Within the last year or so, there has been some notable progress in this area however. Models developing notions of attention have shown positive results on a number of realworld tasks such as machine translation and image captioning. There has also been a surge in building models of computation which explore differing forms of explicit storage. Towards that end, I’ll shed some light on a set of models that fall in this category. In particular, I’ll discuss the Memory Networks, and its application to a wide variety of tasks, such as, question answering based on simulated stories, cloze style question answering, and dialog modeling. I’ll also talk about their subsequently proposed variants, including, End2End Memory Networks and Key Value Memory Networks. In addition, I will also talk about Neural Turing Machines, and Stack Augmented Recurrent Neural Networks. Throughout the talk I’ll discuss the advantages and disadvantages of each of these models and their variants. I will conclude with a discussion on what is still lacking among these models and potential open problems.
The slides for this lecture are available here. The video of this lecture is available here.  
Large Scale Deep Learning with TensorFlow
The last few years have seen deep learning make significant advances in fields as diverse as speech recognition, image understanding, natural language understanding, translation, robotics, and healthcare. In this talk I'll describe some of the machine learning research done by the Google Brain team (often in collaboration with others at Google). As part of our research, we have built two systems, DistBelief, and TensorFlow, for training largescale deep learning models on large datasets. I'll describe some of the distributed system techniques we use to scale training of such modelsbeyond single devices, as well describe some of the design decisions and implementation of TensorFlow system, which was open sourced in November, 2015.
The slides for this lecture are available here. Video that accompanies slide 218 is here. The video of this lecture is available here. 

Deep Natural Language Understanding
In this lecture, I start with a claim that natural language understanding can largely be approached as building a better language model and explain three widelyadopted approaches to language modelling. They are ngram language modelling, feedforward neural language modelling and recurrent language modelling. As I develop from the traditional ngram language model toward recurrent language model, I discuss the concepts of data sparsity and generalization via continuous space representations. I then continue on to the recent development of a novel paradigm in machine translation based on recurrent language modelling, often called neural machine translation. The lecture concludes with three new opportunities in natural language processing/understanding made possible by the introduction of continuous space representations in deep neural networks.
The slides for this lecture are available here. The video of this lecture is available here. 
 Beyond Seq2Seq with Augmented RNNs
Sequence to sequence models in their most basic form, following an encoderdecoder paradigm, compressively encode source sequence representations into a single vector representation and decode this representation into a target sequence. This lecture will discuss the problems with this compressive approach, some solutions involving attention and external differentiable memory, and issues faced by these extensions. Motivating examples from the field of natural language understanding will be provided throughout.
The slides for this lecture are available here. The video of this lecture is available here.  
GPU programming for Deep Learning
The slides for this lecture are available here. The video of this lecture is available here.   Introduction to Reinforcement Learning
The slides for this lecture are available here. The video of this talk is available The video of Joelle's talk on "Advanced Topics in RL" is available here.   Pieter Abbeel from UC Berkeley Deep Reinforcement Learning
The slides for this lecture are available here. The video of this talk is available   Learning Deep Generative Models
In this tutorial I will discuss mathematical basics of many popular deep generative models, including Restricted Boltzmann Machines (RBMs), Deep Boltzmann Machines (DBMs), Helmholtz Machines, Variational Autoencoders (VAE) and Importance Weighted Autoencoders (IWAE). I will further demonstrate that these models are capable of extracting meaningful representations from highdimensional data with applications in visual object recognition, information retrieval, and natural language processing.
The slides for this lecture are available here. The video of this talk is available   Building Machines that Imagine and Reason: Principles and Applications of Deep Generative Models
Deep generative models provide a solution to the problem of unsupervised learning, in which a machine learning system is required to discover the structure hidden within unlabelled data streams. Because they are generative, such models can form a rich imagery the world in which they are used: an imagination that can harnessed to explore variations in data, to reason about the structure and behaviour of the world, and ultimately, for decisionmaking. This tutorial looks at how we can build machine learning systems with a capacity for imagination using deep generative models, the types of probabilistic reasoning that they make possible, and the ways in which they can be used for decision making and acting.
Deep generative models have widespread applications including those in density estimation, image denoising and inpainting, data compression, scene understanding, representation learning, 3D scene construction, semisupervised classification, and hierarchical control, amongst many others. After exploring these applications, we'll sketch a landscape of generative models, drawingout three groups of models: fullyobserved models, transformation models, and latent variable models. Different models require different principles for inference and we'll explore the different options available. Different combinations of model and inference give rise to different algorithms, including autoregressive distribution estimators, variational autoencoders, and generative adversarial networks. Although we will emphasise deep generative models, and the latentvariable class in particular, the intention of the tutorial will be to explore the general principles, tools and tricks that can be used throughout machine learning. These reusable topics include Bayesian deep learning, variational approximations, memoryless and amortised inference, and stochastic gradient estimation. We'll end by highlighting the topics that were not discussed, and imagine the future of generative models.
The slides for this lecture are available here. The video of this talk is available   Beyond inspiration: Five lessons from biology on building intelligent machines
The only known systems that exhibit truly intelligent, autonomous behavior are biological. If we wish to build machines that are capable of such behavior, then it makes sense to learn as much as we can about how these systems work. Inspiration is a good starting point, but real progress will require gaining a more solid understanding of the principles of information processing at work in nervous systems. Here I will focus on five areas of investigation that I believe will be especially fruitful: 1) the study of perception and cognition in tiny nervous systems such as wasps and jumping spiders, 2) developing good computational models of nonlinear signal integration in dendritic trees, 3) the use of sparse, overcomplete representations of sensory input, 4) understanding the computational role of feedback in neural systems, and 5) the use of active sensing systems for acquiring information about the world.
The slides for this lecture are available here. The video of this talk is available   Theoretical neuroscience and deep learning theory
Both neuroscience and machine learning are experiencing a renaissance in which fundamental technological changes are driving qualitatively new phases of conceptual progress. In neuroscience, new methods for probing and perturbing multineuronal dynamics during behavior have lead to the ability to create complex neural network models for the emergence of behavior from the brain. In machine learning, new methods and computing infrastructure for training neural networks have lead to the creation of deep neural networks capable of solving complex computational problems. These advances in each of these individual fields are laying the groundwork for a new alliance between neuroscience and machine learning. A key dividend of this alliance would be the genesis of new unified theories underlying neural learning dynamics, expressive power, generalization capability, and interpretability of both biological and artificial networks. Ideally such theories could yield both scientific insight into the operation of biological and artificial neural networks, as well as provide engineering design principles for the creation of new artificial neural networks. Here we outline a roadmap for this new alliance, and discuss several vignettes from the beginnings of such an alliance, including how neural network learning dynamics can model infant semantic learning, how dynamically critical weight initializations can lead to rapid training, and how the expressive power of deep neural networks can have its origins in the theory of chaos. We also speculate on how several elements of neurobiological reality, as yet not extensively employed by neural network practitioners, could aid in the design of future artificial neural networks. Such elements include structured neural network architectures motivated by the canonical cortical microcircuit, nested neural loops with a diversity of time scales, and complex synapses with rich internal dynamics.
The slides for this lecture are available here. The video of this talk is available 

