Invited Speakers

Yoshua Bengio (Université de Montréal)

Capturing Dependencies Implicitly [Slides]

This talk will follow up on GAN ideas and cover a few advances around capturing dependencies implicitly. First, we discuss how a simple trick (separately shuffling the order of each dimension or variable in a minibatch) can produce samples from the factorized distribution associated with a given joint distribution, and how this can be used, with a GAN-like discriminator to estimate dependence between variables, their mutual information or optimize their joint entropy. This can lead to non-linear independent component analysis and to a neural estimator of mutual information which makes no explicit assumption about the form of the data distribution. Second, we consider how deep generative models which implicitly capture a distribution via a deep net transform a distribution from a high-entropy one to one that approximates the data distribution, or vice-versa (for encoders or inference). Instead of doing this as the application of independently parametrized layers, we study how to do this recurrently with shared parameters, thus connecting with the MCMC generative process of undirected graphical models. The variational walkback algorithm exploits a variational bound to train such a deep generative process, with the advantage over classical MCMC approaches that it's trained to converge in a small number of steps (like GANs) and that by design it seeks to destroy spurious modes during training.

Kamalika Chaudhuri (University of California, San Diego)

Learning with Adversarial Divergences for Generative Modeling [Slides]

Generative adversarial networks (GAN) are a new class of methods for generative modeling that has achieved a great deal of empirical success. While there are a large number of variants of GANs, how these variants relate to each other and to traditional statistical estimation methods like maximum likelihood and method of moments is not well-understood.

This talk introduces a new framework -- adversarial divergences -- for learning in GANs -- and shows that many existing GANs fit into it. Some interesting properties of learning in this framework are presented. In particular, we show that linear fGANs form an interesting combination of maximum likelihood estimation and method of moments.

Based on joint work with Shuang Liu and Olivier Bousquet.

Arthur Gretton (University College London)

Better gradient regularisation for MMD GANs [Slides]

Generative adversarial networks (GANs) rely on an adaptive critic while training, to teach a generator network to improve its samples towards a reference dataset. I will describe how an integral probability metric, the maximum mean discrepancy, may be used as a GAN critic. The focus will be on gradient regularisation, which is essential in training good MMD critic features. With this regularisation, the MMD is used to obtain current state-of-the art performance on challenging image generation tasks, including 160 × 160 CelebA and 64 × 64 ImageNet. In addition to network training, I'll discuss issues in benchmarking GAN performance.

Pushmeet Kohli (DeepMind)

Interpretable and Semantics-Aware Generative Models [Slides]

Honglak Lee (University of Michigan, Google)

Learning hierarchical generative models with structured representations [Slides]

Percy Liang (Stanford)

Editing is Easier than Generation [Slides]

We study generative models based on editing, which captures local transformations between data points rather than global properties. We show that such models are easy to learn, and together with the training set, they provide a full generative model in the spirit of kernel density estimation. Edit-based models also have additional advantages such as interpretability and ability to extrapolate off the training distribution. We show the effectiveness of these models in learning image transformations, language modeling, and text style transfer.

Juergen Schmidhuber (IDSIA)

Unsupervised Minimax [Slides]

There is a type of unsupervised learning based on the principles of gradient descent/ascent in a minimax game where one neural network minimizes the objective function maximized by another, to model the statistics of given data. The technique was introduced 1991-1996 [pm1, pm2]. It was called Predictability Minimization (PM).

In PM, two networks fight each other, to achieve a "holy grail" of unsupervised learning, namely, an ideal, disentangled, factorial code of the data, where the code components are statistically independent of each other. That is, the probability of a given data pattern is simply the product of the probabilities of its code components. The competition between the networks is the sole training criterion, and is sufficient on its own to train the networks. (The codes of the data are easily decoded.)

Subsequent work since 1997 has shown that the minimax principle is also relevant for the more general case of intrinsically motivated, unsupervised, reinforcement learning agents that actively shape their observation streams through their own actions, exploring the world by inventing and generating their own experiments, to discover novel, previously unknown regularities in the data generated by the experiments [int1, int2, int3].

I’ll discuss pros and cons of these approaches, and relate them to more recent work on unsupervised networks that fight each other, and on artificial curiosity.

[pm1] J. Schmidhuber. Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879, 1992. Based on TR CU-CS-565-91, Univ. Colorado at Boulder, 1991.
[pm2] J. Schmidhuber, M. Eldracher, B. Foltin. Semilinear predictability minimzation produces well-known feature detectors. Neural Computation, 8(4):773-786, 1996.
[int1] J. Schmidhuber. What's interesting? TR IDSIA-35-97, IDSIA, July 1997. (Co-evolution of unsupervised RL adversaries in a zero sum game for exploration. See also [int3].)
[int2] J . Schmidhuber. Artificial Curiosity Based on Discovering Novel Algorithmic Predictability Through Coevolution. In P. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao, Z. Zalzala, eds., Congress on Evolutionary Computation, p. 1612-1618, IEEE Press, Piscataway, NJ, 1999. Based on [int1].
[int3] J. Schmidhuber. Exploring the Predictable. In Ghosh, S. Tsutsui, eds., Advances in Evolutionary Computing, p. 579-612, Springer, 2002. Based on [int1].
More on Predictability Minimization (PM):http://people.idsia.ch/~juergen/ica.html
More on artificial curiosity:http://people.idsia.ch/~juergen/interest.htmlhttp://people.idsia.ch/~juergen/creativity.html

Eric Xing (CMU, Petuum Inc.)

A Unified View of Deep Generative Models [Slides]