Invited speakers

Subhransu Maji

Modeling visual tasks and their relations

Abstract: Despite the recent success, a number of important applications remain beyond the scope of current AI systems due to limited training data. This is a fundamental challenge because real-world data is dynamic and heavy-tailed, where supervision can be hard to acquire. I argue that a principled framework for reasoning about tasks can enable modular and data-efficient solutions. Towards this goal I’ll describe our framework for embedding computer vision tasks into a vector space that allows us to learn and reason about their properties. Our approach represents a task as the Fisher information of the parameters of a generic “probe network". We show that the distance between these vectors correlates with natural metrics over tasks such as domain or label similarity. It is also predictive of transfer, i.e. how much does training a deep network on one task benefit another, and can be used for model recommendation. On a portfolio of hundreds of vision tasks the recommended network using our approach outperforms the current gold standard of fine-tuning an ImageNet pre-trained network, especially when training data is limited. The method generalizes to other domains such as natural language. I’ll conclude with some challenges and directions for future work.

Talk slides: TBA

Zeynep Akata

Towards Recognizing Unseen Categories in Unseen Domains

Abstract: Current deep visual recognition systems suffer from severe performance degradation when they encounter new images from classes and scenarios unseen during training. Hence, the core challenge of Zero-Shot Learning (ZSL) is to cope with the semantic-shift whereas the main challenge of Domain Adaptation and Domain Generalization (DG) is the domain-shift. While historically ZSL and DG tasks are tackled in isolation, this work develops with the ambitious goal of solving them jointly, i.e. by recognizing unseen visual concepts in unseen domains. We present CuMix (Curriculum Mixup for recognising unseen categories in unseen domains), a holistic algorithm to tackle ZSL, DG and ZSL+DG. The key idea of CuMix is to simulate the test-time domain and semantic shift using images and features from unseen domains and categories generated by mixing up the multiple source domains and categories available during training. Moreover, a curriculum-based mixing policy is devised to generate increasingly complex training samples. Results on standard ZSL and DG datasets and on ZSL+DG using the DomainNet benchmark demonstrate the effectiveness of our approach.

Talk slides: TBA

Stefano soatto

The Reachability of Learning Tasks

Abstract: I introduce the notion of reachability of learning tasks, which measures how easy it is to fine-tune a model for a task, after having (pre)trained on another. First, I formalize a learning task as the "information" contained in the training set which, after training, is transferred to the parameters of the trained model, specifically a deep neural network. This requires defining and computing information for a fixed dataset (zero entropy), and a fixed set of model paramters (infinite mutual information from the input to the output), which thankfully can be done thanks to ideas from Fisher, Shannon and Kolmogorov. Then, the (asymmetric) distance between learning tasks can be quantified by the additional amount of information necessary to learn a second task after having learned the first. Unfortunately, two tasks being "nearby" does not mean that it is easy to fine-tune one to learn the other: I will show examples of tasks that are "close" by any reasonable measure, and yet one cannot fine-tune: Pre-training on one not only does not help learn the other, but is detrimental. This phenomenon relates to critical learning periods that are observed in a variety of learning systems, natural and artificial. This motivates the introduction of a "dynamic" component of the distance between tasks, that serves to define and characterize task reachability. Critical periods show that regularization in deep learning does not follow the classical template, but instead their usefulness is exhausted in the initial transient, pointing to the importance of the transient and the learning dynamics, which is a wide open area of investigation.

Talk slides: TBA

Sanja fidler

A.I. Data Factory for A.I.

Abstract: TBA

Talk slides: TBA

Dengxin Dai

Domain Adaptation via Self-Training and Data Simulation

Abstract: I will talk about two important strategies for domain adaptation: self-training and data simulation. Although tremendous progress has been made for object recognition, the typical recipe for supervised learning -- creating large-scale datasets with accurate human annotations -- is hardly scalable. I will talk about how self-training and semi-synthetic data generation can be used for domain adaptation and do so in the curriculum learning framework. Since the ability to robustly cope with ``bad'' weather and lighting conditions is absolutely essential for applications like self-driving cars, I will focus on domain adaptation from the clear-weather, daytime condition to fog, and to nighttime. In this talk, I will also talk about our ECCV’20 work on supervision transfer from videos to binaural sounds for sound-based semantic perception and depth perception. The proposed method is able to predict the semantic masks of sound-making objects and the rough depth maps of street-view scenes based on sounds only. In the end, I will also present our ECCV’20 work E2GAN, an efficient and effective NAS algorithm of GAN architecture for data generation. E2GAN is able to discover highly competitive architectures for generally better image generation results with a considerably reduced computational burden: 7 GPU hours.

Talk slides: TBA

PEter koniusz

Few-shot Learning: From Domain Adaptation to Action Recognition to Noisy Gradients.

Abstract: In this talk, I focus on Few-shot Learning (FSL) by looking at it from three different perspectives: image classification with experimentation on the domain shift in FSL, action recognition with the focus on permutation invariance of attention mechanism, and finally meta-learning and the notion of noisy gradients in the low-sample regime. After a short intro to the domain shift in FSL, I move to Few-Shot Action Recognition (FSAR) and the notion of distribution shift in temporal attentive regions produced by the attention mechanism. I explain the proposed network, the attention mechanisms and self-supervision tasks. I illustrate how aligning permuted attention coefficients with non-permuted attention coefficients obtained from permuted the same way temporal blocks helps train the attention mechanism robust to the distribution shift of attentive locations. Finally, I discuss gradient preconditioning and modulation in FSL. To this end, I detail our ModGrad pipeline including the generative modulator network, I explain why our modulator has the ability to perform adaptive low pass filtering on the gradient, and how it can filter the noise. Finally, I provide some theoretical observations, simulations and experimental results.

Talk slides: TBA