Statistical Learning with Applications (550.437, 2010 Fall)
Clark 302A, 6-7678,
Office Hour: Tues 3-4, Wed 3-4.
Peng Liu (office hour & homework & website)
Whitehead Hall 211E
Office Hour: Tues 1-3

Qi Wang(homework)
Whitehead Hall 214
    * Learn enough about the theory of statistical learning to comfortably read research articles in the "computational" sub-fields of biology, neuroscience, vision and linguistics.
    * Gain experience in the practice of statistical learning by applying the basic principles to a real, difficult, high-dimensional pattern recognition problem, such as molecular cancer diagnosis or face detection.
Chapters of a draft monograph will be distributed on a regular basis. These chapters cover nearly all of the basic material on statistical learning, as well as some of the applications. Other applications, notably to molecular cancer diagnosis, modeling gene regulatory networks and language modeling, will be presented in class. An excellent and comprehensive reference is the book Pattern Recognition and Machine Learning by Christopher Bishop published by Springer in 2006. Other useful references include:
    1. Cover, T. and Thomas, J., Elements of Information Theory, John Wiley, 1991. (Great book from which some course material was taken.)
    2. Duda, R. O. Hart, P. E. and Stork, D.G, Pattern Classification, John Wiley, New York, 2001. (Updated version of classic text of 1973.)
    3. Devroye, L., Gyor , L. and Lugosi, G., A Probabilistic Theory of Pattern Recognition, Springer-Verlag, 1996. (Theory of inductive learning; abstract and technical.)
    4. Hastie, T., Tibshirani, R. and Friedman, J., The Elements of Statistical Learning, Springer-Verlag, 2001. (Generally excellent text.)
    * Project: This is the most important one. Towards the beginning of the term the students will be given access to a large amount of annotated data from one of the featured application areas, such as molecular disease diagnosis or scene interpretation. The objective is to conceive, "train" and evaluate a novel method for extracting an appropriate annotation from unlabeled data. This will involve programming, but any language is acceptable. Each student is expected to write a report describing his or her experiments and conclusions; attaching computer code is optional and relatively unimportant. These reports are due towards the end of the semester. The project will be described in detail in a separate handout.
    * Final Exam: This consist of a few problems generally easier than those assigned for homework.
    * Homework Problems: The monograph contains a number of exercises and others will be added. Every 2-3 weeks there will be an assignment consisting of several of these. The important thing is to try the problems, not to get everything right, and you may work together if you wish. Solutions will be provided.
    * Class Participation: There are regular, although unscheduled, discussions about the pros and cons of various learning strategies, performance criteria, etc., and I would like to see everybody take part in these.
Project 40%; Exam 20%; Homework 20%; Class Participation 20%.
To contain course notes, urls for data, homework assignments and solutions, announcements, etc.