Invited Talks

ALL INVITED TALKS WILL BE DELIVERED REMOTELY

1. S. P. Arun, Indian Institute of Science, Bangalore, India


Talk: How does the brain crack CAPTCHAs?

(watch on YouTube)


Abstract: It was famously remarked in the 1980s that a major question for AI is "What is the letter A?". Surprisingly, even today, the simple act of recognizing text is so challenging for computers that we continue to use distorted letter CAPTCHAs to validate a user as human. So how does the brain crack CAPTCHAs? In the monkey inferior temporal cortex, an area critical for recognition, we show that single neurons encode distorted letter strings according to highly systematic rules that enable perfect distorted letter decoding. Remarkably, the same rules were present in neural networks trained for text recognition. I will describe this and some related findings elucidating object recognition at the behavioral, neuronal and computational levels.



2. Grace Lindsay, University College London, UK


Talk: Comparing visual representations learned through supervised, unsupervised, and reinforcement learning

(watch on YouTube )


Abstract: Artificial neural networks trained in isolation on object recognition have been used as models of primate ventral visual stream. Recently similar networks have been explored as models of mouse visual processing, with less success. Unsupervised training methods have been shown to yield better matches to mouse visual data, however reinforcement learning has traditionally been left unexplored. I compare the representations learned by eight different convolutional neural networks, each with identical ResNet architectures and trained on the same family of egocentric images, but embedded within different learning systems. Specifically, the representations are trained to guide action in a compound reinforcement learning task in a virtual rodent; to predict one or a combination of three task-related targets with supervision; or using one of three different unsupervised objectives. Using representational similarity analysis, I find that the network trained with reinforcement learning differs most from the other networks. Through further analysis using metrics inspired by the neuroscience literature, I find that the model trained with reinforcement learning has a sparse and high-dimensional representation wherein individual images are represented with very different patterns of neural activity. Further exploration suggests these representations may arise in order to guide long-term behavior and goal-seeking in the RL agent. When compared to mouse visual data from the Allen Brain Observatory, the reinforcement learning model performs on par or better than other models. These results provide insights into how the properties of neural representations are influenced by objective functions.



3. Elisabetta Chicca, University of Groningen, Netherlands


Talk: Finding the Gap: Neuromorphic Motion Vision in Cluttered Environments

(watch on YouTube )



4. Heiko Neumann, Ulm University, Germany


Talk: Canonical neural circuit computations for perceptual disambiguation and contextual decision-making

(watch on YouTube )


Abstract: Sensory stimuli are often locally ambiguous and their interpretation depends on the spatio-temporal scene context. An initial feedforward sweep of input processing builds perceptual base representations of elemental features. For the steering of behavioural tasks, perceptual mechanisms bind such features across distal locations and multiple feature dimensions to build neural representations of prototypical perceptual items with disambiguated scene attributes.


Neural mechanisms for such perceptual binding operations utilize different canonical principles of brain computation. We motivate how several such principles, namely, feedforward/feedback recurrent interaction, activity normalization for gain control, cooperative-competitive dynamic response interaction, hierarchical processing and uncertainty reduction, and adaptation through habituation, enable neural systems to process sensory input and adapt to changing environmental scene conditions. In computational experiments, we demonstrate model mechanisms of visual form and motion processing and multi-sensory integration that utilize these principles in a unified framework.



5. Tim Kietzmann, University of Osnabrück, Germany


Talk: Recurrence as a key ingredient for understanding and mirroring robust human object recognition

(watch on YouTube )



6. Leyla Isik, Johns Hopkins University, USA


Talk: The neural computations underlying human social vision and insights for machine vision

(watch on YouTube )


Abstract: Humans perceive the world in rich social detail. We effortlessly recognize not only objects and faces in our environment, but also other peoples’ social interactions. The ability to perceive others’ social interactions is critical for social trait judgement and ultimately guides how humans act in the social world. We recently identified a region that selectively represents others’ social interactions in the posterior superior temporal sulcus (pSTS) using controlled experiments with simple stimuli. However, it is unclear how social interactions are processed in the real world where they co-vary with many other sensory and social features. In this talk I will discuss new work using naturalistic video paradigms and novel machine learning analyses to understand how humans process social interactions in natural settings. We find that social interactions guide behavioral judgements and are selectively processed in the brain, even after controlling for the effects of other visual and social information. Finally, I will discuss the computational implications of humans’ social interaction selectivity and how we can develop artificial systems that share this core human ability.



7.Michael Beyeler, University of California, Santa Barbara, USA


Talk: Towards a Smart Bionic Eye: The Emerging Role of Computer Vision and AI for Artificial Vision

(watch on YouTube )


Abstract: Electronic visual prostheses (“bionic eye”) aim to restore vision to individuals living with incurable blindness. Similar to cochlear implants, these devices stimulate surviving retinal or visual cortical cells to evoke neural responses that are interpreted by the brain as visual percepts. However, the vision provided by current devices differs substantially form normal sight. Rather than aiming to one day restore natural vision, a Smart Bionic Eye could provide computer vision (CV) powered visual augmentations to support scene understanding in specific real-world tasks that are known to diminish the quality of life of bionic eye recipients. However, the challenge is less about dreaming up new CV strategies and more about identifying the design principles and visual cues that are best suited to augment the visual scene in a way that supports behavioral performance.


In this talk, I will highlight recent work in our lab to understand how visual prostheses interact with the human visual system to shape perception and discuss challenges and limitations of CV-powered visual augmentation strategies for the blind.



8. Bruno Olshausen, University of California, Berkeley, USA


Talk: Robust and efficient, probabilistic inference in sparse coding networks

(watch on YouTube, presentation slides)



9. Jeff Krichmar, University of California, Irvine, USA


Talk: Motion processing the the primate dorsal visual stream

(watch on YouTube )


Abstract: The nervous system is under tight energy constraints and must represent information efficiently. This is particularly relevant in the dorsal part of the medial superior temporal area (MSTd) where neurons encode complex motion patterns in order to support a variety of behaviors. A sparse decomposition model based on a dimensionality reduction principle known as Nonnegative Matrix Factorization (NMF) was previously shown to account for a wide range of MSTd visual response properties. This model resulted in sparse and parts-based representations that could be regarded as basis flow fields, a linear superposition of which accurately reconstructed the input stimuli. This model provided evidence that the seemingly-complex response properties of MSTd may be a by-product of MSTd neurons performing dimensionality reduction on their input. In the current study, we propose a Spiking Neural Network (SNN) model of MSTd based on evolved spike-timing dependent plasticity and homeostatic synaptic scaling (STDP-H) learning rules. We demonstrate that the SNN model learns a compressed and efficient representation of the input patterns, resulting in MSTd-like receptive fields. This SNN model suggests that STDP-H observed in the nervous system may be performing a similar function as NMF to efficiently encode complex motion patterns.