Aim and Scope
As researchers working at the intersection of biological and machine vision, we have noticed an increasing interest in both communities to understand and improve on each other’s insights. Recent advances in machine learning (especially deep learning) have led to unprecedented improvements in computer vision. These deep learning algorithms have revolutionized computer vision, and now rival humans at some narrowly defined tasks such as object recognition (e.g., the ImageNet Large Scale Visual Recognition Challenge). In spite of these advances, the existence of adversarial images (some of which have perturbations imperceptible to humans) and rather poor generalizability across datasets point out the flaws present in these networks. On the other hand, the human visual system remains highly efficient at solving real-world tasks and capable of solving many visual tasks. We believe that the time is ripe to have extended discussions and interactions between researchers from both fields in order to steer future research in more fruitful directions. This workshop will compare human vision to state-of-the-art machine perception methods, with specific emphasis on deep learning models and architectures.
Our workshop will address many important questions. These include: 1) What are the representational differences between human and machine perception? 2) What makes human vision so effective? And 3) What can we learn from human vision research? Addressing these questions is not as difficult as previously thought due to technological advancements in both computational science and neuroscience. We can now measure human behavior precisely and collect huge amounts of neurophysiological data using EEG and fMRI. This places us in a unique position to compare state-of-the-art computer vision models and human behavioral/neural data, which was impossible to do a few years ago. However, this advantage also comes with its own set of problems: Which task/metric to use for comparison? What are the representational similarities? How different are the computations in a biological visual system when compared to an artificial vision system? How does human vision achieve invariance?
We think this workshop is a great opportunity for researchers working on human and/or machine perception to come together and discuss plausible solutions to some of the aforementioned problems.
- architectures for processing visual information in the human brain and computer vision (e.g. feedforward vs feedback, shallow vs deep networks, residual, recurrent, etc)
- limitations of existing computer vision/deep learning systems compared to human vision
- learning rules employed in computer vision and by the brain (e.g. unsupervised/semi-supervised learning, Hebb rule, STDP)
- representations/features in humans and computer vision
- tasks/metrics to compare human and computer vision (e.g. eye fixation, reaction time, rapid categorization, visual search)
- new benchmarks (e.g. datasets)
- generalizability of machine representation to other tasks
- new techniques to measure and analyze human psychophysics and neural signals
- the problem on invariant learning
- conducting large scale behavioral and physiological experiments (e.g., fMRI, cell recording)