Neural Models of Real-World Perception (EECE 7905/8905)

Spring 2014


Instructor: Bonny Banerjee, Ph.D.


Contact Information:

Office: 208B Engineering Science Bldg

Phone: 901-678-4498

E-mail: BBnerjee@memphis.edu

Office Hours: By appointment


When: TR 11:20 am - 12:45 pm


Where: Engineering Science Bldg. Room 218


Course Description:

In this course, we will consider the human perceptual system as nature's design by evolution. The question we ask is – why did the design end up like this? We will try to answer that question by looking at the requirements – from a computational perspective – that the design has to satisfy in order to survive in this world. Students will learn how the human perceptual (visual, auditory) system is designed to account for generality and efficiency. We will not discuss models which only mimic different parts of the brain.

A major part of the course will be dedicated to deep learning models. Feature extraction is inevitable to many real-world tasks. Features learned driven by data have been shown to be superior to handcrafted features in many tasks (e.g., recognition, denoising) in multiple applications involving visual, audio, textual and multimodal data. Unlike deep models, shallow architectures, such as neural networks with one hidden layer and Support Vector Machines, have been claimed to be inefficient at representing complex functions involved in perception. Deep learning models have led to the state-of-the-art results in object recognition, action recognition, speech and speaker recognition, and multimodal recognition. This course will cover a variety of deep learning models for learning the norm or invariances from unlabeled data as feature hierarchies, and the principles behind their designs.


Required Text:

Readings from class notes, research papers and book chapters (see reading list below).


Topics and tentative schedule (16 weeks):

Preliminaries

Week 1 (1/16). Course aims and agenda; What is the goal/purpose of the brain; Requirements of a real world perception system [Banerjee, 2013]

Week 2 (1/21-1/23). Gabor filters [Movellan, 2008; http://matlabserver.cs.rug.nl/edgedetectionweb/web/edgedetection_params.html]; Gaussian Derivative/Difference-Of-Offset-Gaussians Model [Young et al., 2001]; The dictionary learning problem [Mairal et al., 2010]

Week 3 (1/28-1/30). Multilayered Perceptron and Backpropagation Algorithm [Wikipedia article]

Week 4 (2/4-2/6). Energy minimization in a recurrent neural network -- discrete and continuous cases [Hopfield, 1982; Banerjee, 2003; Banerjee, 2004]; (Project proposals due)

Deep learning in hierarchical neural networks

Week 5 (2/11-2/13). Sparse autoencoder [Coates et al., 2011; Le et al., 2011; Ng et al., 2011]

Week 6 (2/18-2/20). Hierarchical temporal memory [George, 2008; Numenta white paper, 2011]

Week 7 (2/25-2/27). HMAX model [Riesenhuber & Poggio, 1999; Serre et al., 2005; http://riesenhuberlab.neuro.georgetown.edu/hmax.html]

Week 8 (3/4-3/6). Convolutional neural networks [LeCun & Bengio, 1995] (Midterm exam)

Week 9 (3/11-3/13). Spring break

Week 10 (3/18-3/20). Neocognitron [Fukushima, 1980; 1988] (Preliminary project presentations)

Week 11 (3/25-3/27). Deep belief networks [Hinton, 2007a; 2007b; A practical guide to training restricted Boltzmann machines]

Week 12 (4/1-4/3). Predictive coding [Rao & Ballard, 1999; Friston, 2005; Banerjee & Dutta, 2014]

Advanced topics

Week 13 (4/8-4/10). Oscillator networks with application to scene segmentation [Terman & Wang, 1995; Wang & Brown, 1999]

Week 14 (4/15-4/17). Saliency and attention [Itti & Koch, 2001; Quiles et al, 2011]

Week 15 (4/22-4/24). Multisensory integration [Ngiam et al, 2011; Wessnitzer & Webb, 2006; http://en.wikipedia.org/wiki/Multimodal_integration#Principles_of_multisensory_integration]

Week 16 (4/29). Final project presentations; Final project reports due

5/8/14. Final exam during 8:00-10:00 am


Reading list:

[Banerjee, 2003] B. Banerjee. (2003) A self-organizing auto-associative network for the generalized physical design of microstrip patches, IEEE Trans. Antennas & Propagation, 51(6):1301-1306.

[Banerjee, 2004] B. Banerjee. (2004) Recognition of partially occluded shapes using a neural optimization network, Machine Graphics & Vision, Institute of Computer Science of the Polish Academy of Sciences, 13(1/2):3-23.

[Banerjee, 2013] B. Banerjee. (2013) How can the blind men see the elephant?, AAAI Fall Symposium on How Should Intelligence be Abstracted in AI Research, November 15-17, 2013, Arlington, VA.

[Banerjee & Dutta, 2014] B. Banerjee and J. K. Dutta. (2014) SELP: A general-purpose framework for learning the norms from saliencies in spatiotemporal data. Neurocomputing: Special Issue on Brain Inspired Models of Cognitive Memory, Elsevier.

[Coates et al, 2011] A. Coates, H. Lee and A. Y. Ng. (2011) An analysis of single-layer networks in unsupervised feature learning, In AISTATS 14.

[Friston, 2005] K. Friston (2005) A theory of cortical responses. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1456):815-836.

[Fukushima, 1980] K. Fukushima (1980) Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, 36(4):193-202.

[Fukushima, 1988] K. Fukushima (1988) Neocognitron: A hierarchical neural network capable of visual pattern recognition, Neural Networks, 1(2):119-130.

[George, 2008] D. George (2008) How the brain might work: A hierarchical and temporal model for learning and recognition, PhD thesis, Stanford University.

[Hinton, 2007a] G. E. Hinton (2007) Learning multiple layers of representation. Trends in Cognitive Sciences, 11:428-434.

[Hinton, 2007b] G. E. Hinton (2007) To recognize shapes, first learn to generate images In P. Cisek, T. Drew and J. Kalaska (Eds.) Computational Neuroscience: Theoretical Insights into Brain Function. Elsevier.

[Hopfield, 1982] J. J. Hopfield (1982) Neural networks and physical systems with emergent collective computational abilities, PNAS 79:2554-2558.

[Itti & Koch, 2001] L. Itti and C. Koch (2001) Computational modeling of visual attention. Nature Reviews Neuroscience, 2(3):194-203.

[Le et al, 2011] Q. V. Le, W. Zou, S. Yeung and A. Y. Ng (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis, In Computer Vision and Pattern Recognition.

[LeCun & Bengio, 1995] Y. LeCun and Y. Bengio (1995) Convolutional networks for images, speech, and time-series. In M. A. Arbib, editor, The Handbook of Brain Theory and Neural Networks. MIT Press.

[Mairal et al., 2010] J. Mairal, F. Bach, J. Ponce and G. Sapiro (2010) Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11:19-60.

[Movellan, 2008] J. R. Movellan, Tutorial on Gabor Filters.

[Ng et al., 2011] http://www.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf

[Ngiam et al, 2011] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee and A. Y. Ng (2011) Multimodal deep learning, Twenty-Eighth International Conference on Machine Learning.

[Numenta white paper, 2011] Hierarchical Temporal Memory including HTM Cortical Learning Algorithms, Numenta white paper.

[Quiles et al, 2011] M. G. Quiles, D. L. Wang, L. Zhao, R. A. F. Romero and D.-S. Huang (2011) Selecting salient objects in real scenes: An oscillatory correlation model. Neural Networks, 24:54-64.

[Rao & Ballard, 1999] R. P. N. Rao and D. H. Ballard (1999) Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1):79-87.

[Riesenhuber and Poggio, 1999] M. Riesenhuber and T. Poggio (1999) Hierarchical models of object recognition in cortex. Nature Neuroscience, 2:1019-1025.

[Serre et al, 2005] T. Serre, M. Kouh, C. Cadieu, U. Knoblich, G. Kreiman and T. Poggio (2005) A theory of object recognition: Computations and circuits in the feedforward path of the ventral stream in primate visual cortex. AI MEMO-2005-036, MIT.

[Terman & Wang, 1995] D. Terman and D. L. Wang (1995). Global competition and local cooperation in a network of neural oscillators. Physica D, 81:148-176.

[Wang & Brown, 1999] D. L. Wang and G. J. Brown (1999). Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans. Neural Networks, 10:684-697.

[Wessnitzer & Webb, 2006] J. Wessnitzer and B. Webb (2006) Multimodal sensory integration in insects - towards insect brain control architectures. Bioinspiration and Biomimetics, 1:63-75.

[Young et al., 2001] R. A. Young, R. M. Lesperance and W. W. Meyer (2001) The Gaussian Derivative model for spatial-temporal vision: I. Cortical model. Spatial Vision, 14(3):261–319.


Evaluation and Final Grades:

Final grades will be assigned based on class participation (10%), homeworks (25%), a project (25%), midterm and final exams (20%+20%). Class participation will include journal-like peer review of the projects of the fellow students. There will be about eight homeworks which will include deriving pseudo-code from assigned papers and some programming assignments. The project will include proposal of a solution to an open problem, implementation of the solution in software, preliminary and final presentations of the project, and a final report. The two exams will test theoretical knowledge. The 7905 and 8905 sections will be graded separately. In each exam, the students enrolled for 8905 will have to answer one more question.


Certain computational models of the brain produce surprisingly good results when applied to real-world perception problems (e.g., object recognition). In this course, we will see why and how!