Neural Models of Real-World Perception (COMP/PSYC/PHIL 7514/8514)

Spring 2012 [Please scroll down for the list of external speakers]


Instructor: Bonny Banerjee, Ph.D.


Contact Information:

Office: 208B Engineering Science Bldg

Phone: 901-678-4498

E-mail: BBnerjee@memphis.edu

Office Hours: By appointment


When: Wednesdays 2:20-5:20pm


Where: FIT Bldg. Room 405


Note: In Spring 2012, this course is being offered as a Cognitive Science Seminar. There will be a regular lecture during 2:20-4:00 pm and a talk by an invited speaker on a topic relevant to the course during 4:00-5:20 pm. Everyone, including faculty, is welcome to attend the talks.


Course Description:

In this course, we will consider the human perceptual system as nature's design by evolution. The question we ask is – why did the design end up like this? We will try to answer that question by looking at the requirements – from a computational perspective – that the design has to satisfy in order to survive in this world. In this course, students will learn how the human perceptual (visual, auditory) system is designed to account for generality and efficiency. We will not discuss models whose main purpose is to mimic different parts of the brain.


Required Text:

Readings from research papers and different book chapters (see reading list below).


Topics and tentative schedule (15 weeks):

Preliminaries

Week 1 (01/18/12). Course aims and agenda; Requirements of a real world perception system

Week 2 (01/25/12). Gabor filters [Movellan, 2008; http://matlabserver.cs.rug.nl/edgedetectionweb/web/edgedetection_params.html]

Week 3 (02/01/12). Multilayered Perceptron and Backpropagation Algorithm [Wikipedia article]

Week 4 (02/08/12). Statistical regularities of real world perceptual data [Hyv¨arinen, Hurri & Hoyer, 2009]; (Project proposals due)

Deep learning in hierarchical neural networks

Week 5 (02/15/12). Sparse autoencoder [Coates et al., 2011; Le et al., 2011; Ng et al., 2011]

Week 6 (02/22/12). Hierarchical temporal memory [George, 2008; Numenta white paper, 2011]

Week 7 (02/29/12). HMAX model [Riesenhuber & Poggio, 1999; Serre et al., 2005; http://riesenhuberlab.neuro.georgetown.edu/hmax.html]

Week 8 (03/07/12). Spring break

Week 9 (03/14/12). Convolutional neural networks [LeCun & Bengio, 1995]

Week 10 (03/21/12). Neocognitron [Fukushima, 1980; 1988] (Preliminary presentations by students)

Week 11 (03/28/12). Deep belief networks [Hinton, 2007a; 2007b; A practical guide to training restricted Boltzmann machines]

Advanced topics

Week 12 (04/04/12). Oscillator networks with application to scene segmentation [Terman & Wang, 1995; Wang & Brown, 1999]

Week 13 (04/11/12). Saliency and attention [Itti & Koch, 2001; Quiles et al, 2011]

Week 14 (04/18/12). Multisensory integration [Ngiam et al, 2011; Wessnitzer & Webb, 2006; http://en.wikipedia.org/wiki/Multimodal_integration#Principles_of_multisensory_integration]

Week 15 (04/25/12). Final presentations by students


Reading list:

[Movellan, 2008] J. R. Movellan, Tutorial on Gabor Filters.

[Hyv¨arinen, Hurri & Hoyer, 2009] Hyvarinen, A., Hurri, J. and Hoyer, P.O. (2009) Natural Image Statistics: A probabilistic approach to early computational vision, Springer.

[Ng et al., 2011] http://www.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf

[Fukushima, 1980] K. Fukushima (1980) Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, Springer.

[Fukushima, 1988] K. Fukushima (1988) Neocognitron: A hierarchical neural network capable of visual pattern recognition, Neural networks, Elsevier.

[LeCun & Bengio, 1995] Y. LeCun and Y. Bengio. Convolutional networks for images, speech, and time-series. In M. A. Arbib, editor, The Handbook of Brain Theory and Neural Networks. MIT Press, 1995.

[Riesenhuber and Poggio, 1999] Riesenhuber, M. & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience 2: 1019-1025.

[Serre et al, 2005] Serre, Kouh, Cadieu, Knoblich, Kreiman, and Poggio (2005) A theory of object recognition: Computations and circuits in the feedforward path of the ventral stream in primate visual cortex.

[Hinton, 2007a] Hinton, G. E. (2007) Learning multiple layers of representation. Trends in Cognitive Sciences, Vol. 11, pp 428-434.

[Hinton, 2007b] Hinton, G. E. (2007) To recognize shapes, first learn to generate images In P. Cisek, T. Drew and J. Kalaska (Eds.) Computational Neuroscience: Theoretical Insights into Brain Function. Elsevier.

[George, 2008] George, D. (2008) How the brain might work: A hierarchical and temporal model for learning and recognition, PhD thesis, Stanford University.

[Numenta white paper, 2011] Hierarchical Temporal Memory including HTM Cortical Learning Algorithms, Numenta white paper.

[Terman & Wang, 1995] Terman D. and Wang D.L. (1995). Global competition and local cooperation in a network of neural oscillators. Physica D, vol. 81, 148-176.

[Wang & Brown, 1999] Wang D.L. and Brown G.J. (1999). Separation of speech from interfering sounds based on oscillatory correlation. IEEE Transactions on Neural Networks, vol. 10, 684-697.

[Itti & Koch, 2001] Itti, L. and Koch, C. (2001) Computational modeling of visual attention. Nature Reviews Neuroscience, 2(3). pp. 194-203.

[Quiles et al, 2011] Quiles M.G., Wang D.L., Zhao L., Romero R.A.F., and Huang D.-S. (2011): Selecting salient objects in real scenes: An oscillatory correlation model. Neural Networks, vol. 24, pp. 54-64.

[Wessnitzer & Webb, 2006] Wessnitzer, J. and Webb, B. (2006) Multimodal sensory integration in insects - towards insect brain control architectures. Bioinspiration and Biomimetics 1:63-75.

[Ngiam et al, 2011] Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng, A. Y. (2011) Multimodal deep learning, In Proceedings of the Twenty-Eighth International Conference on Machine Learning.

[Le et al, 2011] Le, Q. V., Zou, W., Yeung, S. and Ng, A. Y. (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis, In Computer Vision and Pattern Recognition, 2011.

[Coates et al, 2011] Adam Coates, Honglak Lee and Andrew Ng. (2011) An analysis of single-layer networks in unsupervised feature learning, In AISTATS 14.


Evaluation and Final Grades:

This course requires high level of creative, research-oriented activities from each student. Grading will include the following components: presentations (20%), project (60%), project report (20%). Students will develop a comprehensive project in their selected topic, including theoretical foundations, implementation, and testing using publicly available data or simulated environment. The instructor will suggest projects in class that may be pursued solo or in groups. Any interesting project from the students will be very much welcome.

If you have ever thought of how perception works and want to seriously explore your ideas, this is the course for you!



External Speakers:


Week 1 (01/18/12). Dr. Adriane Seiffert, Vanderbilt University

Title: How we know where things go

Abstract: Every day, we encounter moving objects, while walking amongst other people, driving in traffic or even watching sporting events. To appropriately respond in these dynamic environments, we use a cognitive ability to simultaneously maintain multiple representations of specific objects as they change location; an ability called object tracking. In the PAC lab, we have discovered several aspects of how people track objects including the attentional demands, eye-movement strategies, the relevant frames of reference and interactions with motion perception. From these complementary experiments, we have concluded that people discretely update a spatial representation of tracked objects using motion information gleaned from attentional and visual scrutiny. As such, it is a description of how perception, attention and memory interact by combining information over space and time to make sense of our dynamic world.


Week 2 (01/25/12). Dr. Shaum Bhagat, University of Memphis

Title: On the role of top-down processing in audition

Abstract: The role of top-down processing is a topic of considerable research interest in disciplines as diverse as cognitive psychology and neurophysiology. Scientists have considered the role of selective attention in hearing for decades, but the ability to explore this topic from both physiological and psychophysical perspectives is a recent phenomenon. Measurement of otoacoustic emissions provides a means to noninvasively examine the function of outer hair cells and the olivocochlear bundle during both passive listening tasks and focused attention tasks. This paradigm provides a novel methodology in exploring corticofugal mechanisms that can potentially impact auditory perception.


Week 3 (02/01/12). Dr. Michael Cannito, University of Memphis

Title: Speech Intelligibility and Acoustic Cues to Word Recognition in Parkinson’s Disease

Abstract: This presentation will discuss voice and speech impairments associated with idiopathic Parkinson’s Disease (PD) before and after intensive voice treatment. Data will be considered from eight speakers with PD whose perceived speech intelligibility was evaluated and shown to improve significantly post intervention. Acoustic analyses were then employed to identify speech signal changes relevant to word recognition for intelligibility that may account for the perceived improvement in PD speech. Implications for speech perception will be discussed.


Week 4 (02/08/12). Dr. Charles Blaha, University of Memphis

Title: Synaptic plasticity and learning


Week 5 (02/15/12). Anne Warlaumont, University of Memphis

Title: Neural networks for studying infant vocalizations

Abstract: There are several ways in which artificial neural networks are applied to the study of infant vocalizations. One way is to use neural networks to cluster and classify infant sounds based on their vocalization acoustics. As new, large-scale databases of infant sounds become available, automated analysis approaches such as these are increasingly important. The second way is to use neural networks in computational models that aim to account for the mechanisms underlying vocal motor and perceptual development. Along these lines, neural networks can be used to drive voice synthesizers, to perceive the model's own as well as external acoustic input, and to form sensorimotor connections. The influence of external input (such as a caregiver's speech) and of reinforcement for the models' own productions can be studied within such neural-network based computational models.


Week 6 (02/22/12). Dr. Stanislav Zakharenko, University of Tennessee/St. Jude Children's Research Hospital

Title: Towards understanding the molecular mechanisms of auditory memory

Abstract: Learning and memory are achieved by storing specific sensory experiences. In the primary sensory cortices sensory information is analyzed, and memory traces of learned sensory experiences are stored. In regard to auditory memories, specific memory traces about behaviorally significant sounds are acquired and retained in the primary auditory cortex. Cortical map plasticity, which is believed to be a substrate of auditory memory, is characterized by facilitation of responses to behaviorally important tones at the expense of those at other frequencies. The cellular and molecular substrates of cortical map plasticity are mostly unknown. In this presentation, Dr. Zakharenko will review recent data implicating the plasticity of synaptic connections between neurons in the auditory cortex as a key cellular substrate underlying auditory memory.


Week 7 (02/29/12). Dr. Rene Marois, Vanderbilt University

Title: Deciphering the roots of the capacity limits of attention and awareness

Abstract: The human brain is heralded for its staggering complexity and processing capacity: its hundred billion (10^11) neurons and several hundred trillion synaptic connections can process and exchange prodigious amounts of information over a distributed neural network in the matter of milliseconds. Yet, for all our neurocomputational sophistication and processing power, we show severe limitations in the amount of information that we can consciously perceive and attend to, and we can hardly perform two tasks at once.

In this talk I will describe the research performed in my laboratory that aims at understanding the cognitive and neurobiological underpinnings of these limitations. The upshot of this research points to a prefrontal-parietal network that may be at the origin of the temporal limitations of what we can attend for perceiving and for acting, and that the ultimate cause of such limitation may result from the inability of neural ensembles in this network to encode multiple task-relevant information at the same time. Furthermore, I will describe recent behavioral research suggesting that attention and awareness are subject to different capacity constraints, with attention acting as a limited resource that modulates the probability of conscious perception.

Taken together, these studies highlight the mechanistic origins of our severe attentional limits in consciously perceiving and appropriately responding to sensory events in the world.


Week 8 (03/07/12). Spring break


Week 9 (03/14/12). Dr. Alexander Maier, Vanderbilt University

Title: Does activity in the primary visual cortex support perceptual experience?

Abstract: Primary visual cortex (V1) is one of the best studied structures of the primate brain and much of its functional properties are well understood, but whether its neural activity contributes to our conscious experience is a matter of long-standing debate. Using visual illusions such as perceptual suppression, in which a salient visual pattern escapes perception entirely, experimenters can ask if neural responses encode an observer’s perceptual interpretation or if they truthfully represent the physical structure of a stimulus instead. Single neuron recordings in macaque monkeys as well as neuroimaging studies in humans have successfully applied this paradigm to determine the extent to which activity in primary visual cortex reflects perceptual experience. However, while neurophysiological data from monkeys suggests that V1 neurons represent retinal input regardless of a subject’s perceptual state, human neuroimaging (fMRI) studies consistently demonstrate the presence of a strong perceptual signal.

To understand the basis of this discrepancy, and to compare V1 signals directly during perceptual suppression, we conducted fMRI experiments and neuronal recordings in monkeys that were trained to indicate their perception during this visual illusion. Under conditions in which a stimulus was present but rendered perceptually invisible, we found a sharp divergence between single neuron responses and the hemodynamic fMRI responses in V1, resolving previously discrepant results. Yet, we also found a direct correlate of the perception-related fMRI response in the so-called local field potential (LFP), which might signify the presence of an unknown neural mechanism that correlates with the subject’s perceptual state. We have started to investigate the cellular basis of this signal by converting LFP into a measure of current flow across cellular membranes for the entire laminar structure of the cortical sheet. Using this technique, we have found a strong functional division between the upper and lower layers of V1, suggesting selective modulation by intrinsic connections, other cortical areas and subcortical structures, respectively.

These findings, taken together, suggest that the perceptual outcome of visual stimulation is defined by a dynamic network spanning multiple cortical areas, including V1. Future work will determine the neural dynamics of inter-areal interactions underlying conscious perceptual experience.


Week 10 (03/21/12). Dr. Bonny Banerjee, University of Memphis

Title: Seamless integration of perception and cognition in the same architecture

Abstract: In Artificial Intelligence and other fields, a distinction is often made between perception and cognition construing them as employing different representations and processes. In this view, a major hurdle is to determine the appropriate level of abstraction for representations and processes so that low-level perception and high-level cognition are seamlessly integrated in the same computational/cognitive architecture. For example, if the internal representation is first-order logic, the computational complexity of integrating perception and cognition in the same architecture is enormous [Banerjee & Chandrasekaran, JAIR 2010].

I hypothesize, the brain implements a small set of computational principles that are executed at multiple stages of processing. The goal of this implementation is to accurately and efficiently predict the real world. I will present a hierarchical neural network architecture which learns an internal model from the real world at multiple levels of abstraction by explaining salient stimuli in the environment. I will argue that such an architecture bears the promise to accomplish perceptual tasks (e.g., face recognition, speech recognition) and cognitive tasks (e.g., medical diagnosis, solving crime mysteries) efficiently using the same representations and processes. Results from ongoing research on the hierarchical neural network architecture will be presented.


Week 11 (03/28/12). Dr. Thomas Palmeri, Vanderbilt University

Title: Predicting the neural and behavioral dynamics of perceptual decisions

Abstract: How do humans and non-human primates make perceptual decision about an object's category, identity, or importance? In our work, we formally contrast competing hypotheses about perceptual decision making mechanisms using computational models that are tested on how well or how poorly they predict behavioral and neural dynamics. Our starting point is a well known class of models that assume that perceptual decisions are made by a noisy accumulation of perceptual evidence to a response boundary. Our efforts have focused on developing models of the perceptual evidence that drives this accumulation process and testing whether and how these mechanisms are instantiated in the brain. After introducing the general framework and briefly reviewing past work, I will focus on recent projects that associate perceptual evidence with one class of neurons and the accumulation process with another class of neurons recorded from monkeys trained to make perceptual decisions by a saccadic eye movement. I will highlight novel approaches we have taken to relate cognitive-level explanations and neural-level explanations, using neural data to constrain cognitive theories and using cognitive theories to explain neural dynamics.


Week 12 (04/04/12). Dr. Jeffrey Schall, Vanderbilt University

Title: From salience to saccades: Multiple-alternative gated stochastic accumulator model of visual search

Abstract: We describe a stochastic accumulator model demonstrating that visual search performance can be understood as a gated feed-forward cascade from a salience map to multiple competing accumulators. The model quantitatively accounts for behavior and predicts neural dynamics of macaque monkeys performing visual search for a target stimulus among different numbers of distractors. The evidence accumulated in the model is equated with the spike trains recorded from visually-responsive neurons in the frontal eye field thought to encode stimulus salience. Accumulated variability in the firing rates of these neurons explains choice probabilities and the distributions of correct and error response times with search arrays of different set sizes if the accumulators are mutually inhibitory. The dynamics of the stochastic accumulators quantitatively predict the activity of presaccadic movement neurons that initiate eye movements if gating inhibition prevents accumulation before sufficient evidence about stimulus salience has emerged. Adjustments in the level of gating inhibition can control tradeoffs in speed and accuracy that optimize visual search performance.


Week 13 (04/11/12). Dr. Troy Hackett, Vanderbilt University

Title: Primate Auditory Cortex: Principles of Organization and Future Directions

Abstract: As an animal model of central auditory processing, nonhuman primates play an important role in bridging the findings from research conducted in humans with those derived from other species. Like all animal models, the nonhuman primate is unavoidably incomplete as a model system for understanding human audition. The expanded auditory capabilities of humans appear to make use of extensive adaptations and elaborations in the brain – most of which are waiting to be discovered. In addition, compared to the subcortical auditory pathway, which appears to be more highly conserved across species, the organization of auditory areas in cortex appears to vary so widely that the establishment of homologous areas has been limited to only one or two primary fields. Yet, amidst this diversity, a number of shared anatomical and physiological features have been identified. These ‘principles of organization’ are not only improving comparisons between model species, but their extension to studies of the human brain is also moving us closer to establishing a working model of human auditory cortex that can be tested and refined. This will provide an improved foundation for functional imaging and electrophysiological studies now and into the future.


Week 14 (04/18/12). Dr. Melloni Cook, University of Memphis

Title: Pten (phosphatase and tensin homolog) as a candidate gene in schizophrenia-related behavior in mice.

Abstract: Schizophrenia, often a debilitating mental illness, is characterized by a number of behavioral and cognitive symptoms (Oertel-Knochel, Bittner, Knochel, Prvulovic & Hampel, 2011). Although there is consensus that brain/neurotransmitter dysfunction (particularly that of dopamine and glutamate systems among others) is associated with this disorder, the etiology of the schizophrenia is poorly understood. Identifying the genetic components regulating schizophrenia-related behaviors, particularly, novel genetic components, may aid in our understanding of the disorder and the mechanisms involved. Animal models have been useful in identifying genes involved in such disorders. While we cannot measure schizophrenia in animals, we can assess endophenotypes or simpler traits that make up the more complex disorder. Acoustic startle and prepulse inhibition of the acoustic startle response (PPI) represent endophenotypes of schizophrenia. PPI is used to assess sensorimotor gating. The startle reflex is generally measured in response to a loud acoustic stimulus. When a prepulse stimulus precedes the acoustic stimulus, it should inhibit the startle response to acoustic stimulus (Crawley & Paylor 1997; Swerdlow and Geyer 1998). This phenotype (PPI) is of interest because deficits in PPI have been associated with, not only, schizophrenia, but other neuropsychiatric diseases as well (Joober, Zarate, Rouleau, Skamene & Boska, 2002). Furthermore, pharmacological agents (antipsychotics) used to treat schizophrenia in humans have been shown to alter acoustic startle and PPI responses in animals (Swerdlow & Geyer, 1998), supporting PPI as a valid trait related to schizophrenia. Some even consider PPI as an early vulnerability marker for the disorder (Ziermans et. al, 2012). We have made use of a powerful mouse genetic resource, the BXD recombinant inbred panel, to identify genetic components regulating PPI. Using Quantitative Trait Locus (QTL) analysis, we have identified a potential candidate gene, Pten (phosphatase and tensin homolog), which has not been previously identified relative to PPI or schizophrenia-related traits. The potential role of this gene in other disorders, which have likewise been associated with cognitive deficits) will also be discussed in addition to implications and future plans to molecularly confirm this candidate gene will also be discussed. If we can identify biological markers that increase susceptibility to schizophrenia, at-risk individuals can be identified early rather than suffering with symptoms for years before a diagnosis is made.

References:

Crawley JN, & Paylor R (1997) A proposed test battery and constellations of specific behavioral paradigms to investigate the behavioral phenotypes of transgenic and knockout mice. Hormones and Behavior 31(3):197–211

Joober, R, Zarate, J-M, Rouleau, G-A, Skamene, E & Boksa, P (2002) Provisional Mapping of Quantitative Trait Loci Modulating the Acoustic Startle Response and Prepulse Inhibition of Acoustic Startle. Neuropsychopharmacology 27: 765-781.

Oertel-Knochel, V, Bittner, RA, Knochel, C, Prvulovic, D, & Hampel, H (2011) Discovery and development of integrative biological markers for schizophrenia. Progress in Neurobiology 95: 686-702.

Swerdlow NR,& Geyer MA (1998) Using an animal model of deficient sensorimotor gating to study the pathophysiology and new treatments of schizophrenia. Schizophrenia Bulletin 24(2):285–301

Ziermans, TB, Schothorst, PF, Sprong, M, Magnee, MJCM, van Engeland, H, & Kemner, C (2012). Reduced prepulse inhibition as an early vulnerability marker of psychosis prodrome in adolescence. Schizophrenia Research 134: 10-15.


Week 15 (04/25/12). Dr. Vivien Casagrande, Vanderbilt University

Title: New insights into visual thalamic function: the pulvinar

Abstract: The thalamus and cortex work together in perceptual and motor processing but exactly how this is accomplished remains unclear. Especially mysterious is the role of the pulvinar, the largest thalamic nucleus in primates. In primates the majority of the pulvinar is involved in visual processing and is connected to all visual cortical areas. What is not appreciated is that the lateral pulvinar is also connected reciprocally to the primary visual cortex (V1). The primary visual cortex (V1) receives its driving input from the eyes via the lateral geniculate nucleus (LGN) of the thalamus. The pulvinar, unlike the LGN receives its visual signature input from V1 and sends output back to cortex including back to V1. There has been speculation that the lateral pulvinar could provide driving input to higher order visual areas but little is known about the role of its feedback input to V1. Here we report that lateral pulvinar profoundly influences the activity of cells the supra-granular layers of V1 that provide output to extrastriate areas involved in higher level visual processing. Reversibly inactivating lateral pulvinar almost completely eliminates the responses of V1 supragranular output cells. This remarkable result suggests that pulvinar can have a surprisingly powerful impact on information outflow from V1 to higher order visual cortical areas. We speculate that this pulvino-V1 mechanism plays an important role in the control of bottom-up salience from within a competitive selective-attention network.


Note: Students enrolled in the course will be presenting their projects on 4/25/12 between 2:20-4:00 pm. Everyone is welcome to attend.