ACCV 2016 Tutorial on Deep Learning in Computer Vision


9:00-10:30:  Introduction to deep model (Wanli Ouyang) Slides
10:30-11:00: Tea break
11:00-12:00:  What makes deep learning work?  (HongSheng Li)

14:00-15:00:  Interpreting neural semantics in deep learning and its applications (HongSheng Li)
15:00-15:30: Tea break
15:30-16:30: Structured deep learning (Wanli Ouyang)
16:30-17:00: Open questions and future works (HongSheng Li)


Deep learning has become a major breakthrough in artificial intelligence and achieved amazing success on solving grand challenges in many fields including computer vision, speech recognition, and natural language processing. Its success benefits from big training data and super parallel computational power emerging in recent years, as well as advanced model design and training strategies. The most important breakthrough of deep learning in computer vision happened in 2012. Hinton’s group won the ImageNet object recognition challenge with the deep convolutional neural network and beat conventional computer vision technologies with a large margin.

In this tutorial, we will introduce deep learning and its applications in computer vision. It starts with a historical overview of deep learning and introduction on several classical deep models. Through concrete examples on image classification, face recognition, object detection, human pose estimation, object tracking and video understanding, we will explain why deep learning works in computer vision and how design effective deep models and learning strategies. We will introduce structured deep learning developed in recent years and explain semantic meanings of the learned neural responses. Some open questions related to deep learning will also be discussed in the end.


Xiaogang Wang received his Bachelor degree in Electrical Engineering and Information Science from the Special Class of Gifted Young at the University of Science and Technology of China in 2001, M. Phil degree in Information Engineering from the Chinese University of Hong Kong in 2004, and PhD degree in Computer Science from Massachusetts Institute of Technology in 2009. He is an associate professor in the Department of Electronic Engineering at the Chinese University of Hong Kong since August 2009. He received the Outstanding Young Researcher in Automatic Human Behavior Analysis Award in 2011, Hong Kong RGC Early Career Award in 2012, and Young Research Award of the Chinese University of Hong Kong. He is the associate editor of the Image and Visual Computing Journal. He was the area chair of ICCV 2011, ECCV 2014 and ACCV 2014. His research interests include computer vision, deep learning, crowd video surveillance, object detection, and face recognition.

Wanli Ouyang received the PhD degree in Department of Electronic Engineering, the Chinese University of Hong Kong, where he is now a Research Assistant Professor. His research interests include image processing, computer vision and pattern recognition. Our team participated in  the Imagenet Challenge 2014/2015 for object detection. In this challenge, our deep learning based approaches rank the first in video object detection task at 2015 and the second in still-image object detection task at 2014 and 2015. 

Hongsheng Li received the bachelor’s degree in automation from East China University of Science and Technology, and the master’s and doctorate degrees in computer science from Lehigh University, USA, in 2006, 2010, and 2012, respectively. He is a Research Assistant Professor in Department of Electronic Engineering at the Chinese University of Hong Kong. He was an associate professor in the School of Electronic Engineering at University of Electronic Science and Technology of China. He has published multiple papers in premiere conferences and journals in computer vision and medical imaging, including IEEE T-PAMI, IEEE T-MI, ICCV, CVPR, MICCAI and IPMI. His research interests include computer vision, medical image analysis, and machine learning.