Workshop Talk Abstracts

Peter Battaglia

Object-oriented intelligence

Yoshua Bengio

From deep learning of disentangled representations to higher-level cognition

One of main challenges for AI remains unsupervised learning, at which humans are much better than machines, and we link to another challenge: bringing deep learning to higher-level cognition. We review recent work in deep generative models and propose research directions towards learning of high-level abstractions. This follows the ambitious objective of disentangling the underlying causal factors explaining the observed data. We argue that in order to efficiently capture these, a learning agent can acquire information by acting in the world, moving our research from traditional deep generative models of given datasets to that of autonomous learning or unsupervised reinforcement learning. We propose two priors which could be used by an agent acting in its environment in order to help discover such high-level disentangled representations of abstract concepts. The first one is based on the discovery of independently controllable factors, i.e., in jointly learning policies and representations, such that each of these policies can independently control one aspect of the world (a factor of interest) computed by the representation while keeping the other uncontrolled aspects mostly untouched. This idea naturally brings fore the notions of objects (which are controllable), agents (which control objects) and self (the part of the world that only I can control). The second prior is called the consciousness prior and is based on the observation that our conscious thoughts are low-dimensional objects with a strong predictive or explanatory power (or are very useful for planning). A conscious thought thus selects a few abstract factors (using the attention mechanism which brings these variables to consciousness) and combines them to make a useful statement or prediction. In addition, the concepts brought to consciousness often correspond to words or short phrases and the thought itself can be transformed (in a lossy way) into a brief linguistic expression, like a sentence. Natural language could thus be used as an additional hint about the abstract representations and disentangled factors which humans have discovered to explain their world. A conscious thought also corresponds to the kind of small nugget of knowledge (like a fact or a rule) which has been the main building block of classical symbolic AI. This therefore raises the interesting possibility of addressing some of the objectives of classical symbolic AI focused on higher-level cognition using the deep learning machinery augmented by the architectural elements necessary to implement conscious thinking about disentangled causal factors.

Matthew Botvinick

Meta-reinforcement learning in brains and machines

Alison Gopnik

Life history and learning: Extended human childhood as a way to resolve explore/exploit trade-offs and improve hypothesis search

Tom Griffiths

Revealing human inductive biases and metacognitive processes with rational models

Marc Howard

Scale-invariant temporal memory in AI

For many years cognitive scientists have hypothesized that memory is roughly time-scale-invariant, exhibiting similar properties from time scales of a few hundred milliseconds up to hours and days. Recent evidence in neuroscience confirms key predictions of this view. Events trigger reliable sequences of neurons in a variety of brain regions. The currently-firing neuron can be used to reconstruct the time at which the triggering event took place; the accuracy of this temporal reconstruction decreases with the passage of time as predicted by cognitive science. However, many influential models in AI violate scale-invariance limiting their potential utility. For instance, TD learning allows fast learning of exponentially-discounted future rewards which exhibits as scale fixed by the base of the exponent. In this talk I explore recent work that attempts to develop scale-invariant alternatives to computing expected future outcomes. We discuss methods that exploit a scale-invariant temporal memory to rapidly estimate a trajectory of future events. The potential utility of these approaches in AI applications is discussed.

Robert Jacobs

People infer object shape in a 3D, object-centered coordinate system

Within Cognitive Science, there have been fierce debates about whether people’s visual object representations are 2D versus 3D, viewer-centered versus object-centered, holistic versus part-based, etc. Here we present an approach that emphasizes that visual object perception is a statistical inference problem in which one infers a 3D object-centered shape representation from one or more 2D images [Erdogan & Jacobs (2017). Psychological Review, in press]. The talk is divided into three parts. In Part I, we show that our approach accounts for viewpoint dependency, the finding that people’s object recognition is often dependent on the viewpoint from which an object is viewed. Our approach explains this dependency because different views of an object contain different information about an object’s shape, and thus our approach infers different shape representations from different views. In Part II, we report the results of an experiment in which people rated the shape similarity among different objects. We found that a system based on our approach provides a better account of people’s ratings than several view-based, structural description, and feature-based systems, including deep convolutional neural networks (CNNs) such as AlexNet and GoogLeNet. In Part III, we argue that existing CNNs perform poorly because they are trained on ImageNet, a data set consisting of labeled static images. We outline a roadmap for developing better data sets. The first step is to develop data sets with richer visual cues to 3D structure, including motion parallax and binocular disparity cues. If done properly, each frame in this data set does not need to be labeled due to temporal coherence of structure across frames. The second step is to develop multisensory (e.g., visual-auditory) data sets. The addition of information from other sensory modalities will provide new opportunities for unsupervised learning, thereby reducing the need to fully label visual images. The third step is to develop data sets in which perceptual stimuli depend on an agent’s actions and goals. Once this data set is available, topics common in Cognitive Science, such as task-dependent perceptual predictions or task-dependent visual attention and working memory, will become significantly more important in Artificial Intelligence. [This is joint work with Goker Erdogan, Cogitai Inc.]

Brenden Lake

Cognitive AI

Gary Marcus

Representational primitives, in minds and machines

Michael Mozer

Access consciousness and the construction of actionable representations

Aude Oliva

Mapping the spatio-temporal dynamics of cognition in the human brain

Every perceptual and cognitive function in humans is realized by neural population responses evolving over time and space in multiple brain regions. In this talk, I will describe a brain mapping approach that combine magnetoencephalography (MEG), functional MRI (fMRI) and Convolutional Neural Networks (CNNs) to yield a spatially and temporally integrated characterization of neural representations during perception and memory tasks. Determining the duration and sequencing of cognitive processes at the scale of the whole human brain provides insights to evaluate the computational strategies that may work best for performing complex tasks.

Angela Yu

Computational modeling of human face processing