Fall 2014

Location: Veihmeyer Hall 116
Time: Tues & Thurs 10:30am-11:50am
Units: 4

Instructor: Yong Jae Lee
Email: yjlee@cs  (email subject should begin with 
"[ECS 289H]")
Office: Kemper 3055
Office hours: By appointment


  • Final project paper due 12/12, 11:59 pm.  Please email your report in pdf format to the instructor.  See here for more details.
  • No class on 12/9 and 12/11.
  • Make-up class on 12/14, 10am-12:30pm in Kemper 1131.  

Course Overview

This graduate seminar course will survey papers in a broad range of topics in computer vision, including object recognition, activity recognition, and scene understanding.  The course goals will be to understand and analyze state-of-the-art techniques, and to identify interesting open questions and future directions.  It should be of relevance to students interested in computer vision, machine learning, and/or computer graphics.


The main requirement is interest in computer vision.  Programming will be required for the final project.  Any previous experience in computer vision, image processing, machine learning, and/or computer graphics is a plus.  Please talk to me if you are unsure if the course is a good match for your background.


The bulk of the course will consist of student paper presentations.  Students will be responsible for writing paper reviews each week, participating in discussions, presenting once or twice in class, and completing a final project.


The final grade will be determined by:
  • Paper reviews (25%)
  • Class participation (25%)
  • Paper presentation (25%)
  • Final project (25%)
Important Dates
  • 10/31: Final project proposal due
  • 12/9 & 12/11 12/14: Final project presentations 
  • 12/12: Final project paper due

Detailed course requirements and grading are here.

 Date  Papers  Presenters
 10/2  Introduction   Yong Jae Lee [pdf]

 10/7  Research Overview  Yong Jae Lee [pdf]

 10/9  Local Features and Matching

 Object Recognition from Local Scale-Invariant Features. D. Lowe. ICCV 1999. [code]
 Video Google: A Text Retrieval Approach to Object Matching in Videos. J. Sivic and A. Zisserman. ICCV 2003. [project page]
 Fanyi Xiao [pdf]

 Chuan Wang [pdf]

 10/14  Image Classification

 Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid, and J. Ponce. CVPR 2006. [code]
Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification. L-J. Li, H. Su, E. Xing, and L. Fei-Fei. NIPS 2010. [project page]
Ahsan Abdullah [pdf] Vivek Dubey [pdf]

 Object Detection (Part 1)

 Histograms of Oriented Gradients for Human Detection. N. Dalal and B. Triggs. CVPR 2005. [code]
 Rencheng Tan [pdf]

 LuPeng Xing & Yuduo Wu [pdf]

 10/21  Object Detection (Part 2)

 A Discriminatively Trained, Multiscale, Deformable Part Model. P. Felzenszwalb,  D.  McAllester and D. Ramanan. CVPR 2008. [code]
 Diagnosing Error in Object Detectors. D. Hoiem, Y. Chodpathumwan, and Q. Dai. ECCV 2012. [project page]
 Jay Gokhale [pdf]

 Yuduo Wu [pdf]

 Unsupervised/Weakly-Supervised Visual Discovery

 Unsupervised Discovery of Mid-Level Discriminative patchesS. Singh, A. Gupta, and A. A. Efros. ECCV 2012. [project page]
 Style-aware Mid-level Representation for Discovering Visual Connections in Space and TimeY. J. Lee, A. A. Efros, and M. Hebert. ICCV 2013. [project page] 
 Chuan Wang [pdf]

 Yong Jae Lee [pdf]

 10/28  Segmentation

 Learning a Classification Model for Segmentation. X. Ren and J. Malik. ICCV 2003.
 Combining Top-down and Bottom-up Segmentation. E. Borenstein, E. Sharon, and S. Ullman. CVPR  workshop 2004.
 Antonia Creswell [pdf]

 Collin McCarthy [pdf]

 10/30  Final project proposal due on 10/31


 Visual Recognition with Humans in the Loop. S. Branson, C. Wah, B. Babenko, F. Schroff, P. Welinder, P. Perona and S. Belongie. ECCV 2010. [project page]
 Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds. S. Vijayanarasimhan and K. Grauman. CVPR 2011. [project page]
 Thomas Provan [pdf]

 Yangzihao Wang [pdf]

 11/4  Activity Recognition

 Learning Realistic Human Actions from Movies. I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld. CVPR 2008. [project page]
 Detecting Actions, Poses, and Objects with Relational Phraselets. C. Desai and D. Ramanan. ECCV 2012. 
 Kerry Seitz [pdf]

 Antonia Creswell [pdf]

 11/6  Human Pose

 Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations. L. Bourdev and J. Malik. ICCV 2009. [project page]
 Real-Time Human Pose Recognition in Parts from a Single Depth Image. J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. CVPR 2011. [project page]
 Sugeerth Murugesan [pdf]

 Ahsan Abdullah [pdf]

 11/11  Veteran's Day (no class)

 11/13  Attributes

 Describing Objects by Their Attributes. A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. CVPR 2009. [project page]
Attribute and Simile Classifiers for Face VerificationN. KumarA. BergP. Belhumeur, and S. Nayar. ICCV 2009. [project page]
 Rencheng Tan & Thomas Provan [pdf]

 LuPeng Xing [pdf]

 11/18  Image Search and Mining
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval.  O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. CVPR 2007. [data]
 Image Webs: Computing and Exploiting Connectivity in Image Collections.  K. Heath, N. Gelfand, M. Ovsjanikov, M. Aanjaneya, and L. Guibas.  CVPR 2010.
 Yangzihao Wang [pdf]

Sugeerth Murugesan [pdf]

 11/20  Language and Images

 Every Picture Tells a Story: Generating Sentences for Images. A. Farhadi, M. Hejrati, A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. ECCV 2010. [data]
 Baby Talk: Understanding and Generating Simple Image Descriptions. G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. Berg, and T. Berg. CVPR 2012.
 Kerry Seitz [pdf]

 Saheel Godhane

 11/25  Big Data

 Scene Completion using Millions of Photographs. J. Hays and A. A. Efros. SIGGRAPH 2007. [project page
 Unbiased Look at Dataset Bias. A. Torralba and A. A. Efros. CVPR 2011. [project page]
 Harika Sabbella [pdf]

 Vivek Dubey & Harika Sabbella [pdf]

 11/27  Thanksgiving (no class)

 12/2  First-Person Vision

 Social Interactions: A First-Person Perspective. A. Fathi, J. Hodgins, J. Rehg. CVPR 2012. [data]

 Discovering Important People and Objects for Egocentric Video Summarization. Y. J. Lee, J. Ghosh, and K. Grauman. CVPR 2012. [project page]
 Jason Lin [pdf]

 Saheel Godhane [pdf]

 12/4  Deep Convolutional Neural Networks

 ImageNet Classification with Deep Convolutional Neural Networks. A. 
Krizhevsky, I. Sutskever, and G. Hinton. NIPS 2012.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. R. Girshick, J. Donahue, T. Darrell, and J. Malik. CVPR 2014. [code]
 Jay Gokhale [pdf]

 Collin McCarthy [pdf]

 12/9  Final Project Presentations  No class. Work on final project.

 12/11  Final Project Presentations  No class. Work on final project.  

 12/14  Final Project Presentations  



This course has been inspired by the following courses:

Subpages (1): Requirements