Convolutional neural networks: Reading Guide
Deep Learning is behind state-of-the-art results in several problems cross different fields. In computer vision, Convolutional Neural Networks (CNN), one of the deep learning models, is applied to some image and video challenges. In this article, I will point out to what to read to be on the right track and get started to apply CNN to your vision problems.
For Arabian, this videos hould be helpful.
Prerequisites
- Deep Learning is a Machine Learning model, so you should be aware with such field.
- The easiest start in machine learning probably is Andrew Ng Coursera Course.
- You need at least to cover up to week 5 (Neural Networks: Learning)
- However, It is great to finish the whole series
- The easiest start in machine learning probably is Andrew Ng Coursera Course.
- You need to know Neural Networks(NN), specifically Backpropagation model.
- You may learn it through course by Geoffrey Hinton himself!
- It is a good idea to write/read code for the Backpropagation Algorithm
Convolutional neural networks (CNN)
Basic Reading
- Yann LeCun and Yoshua Bengio introduced the CNN. You should read their work.
- Paper: Gradient-Based Learning Applied to Document Recognition - 1998
- Mainly Pages 5-9 + The related figures.
- Presentation summarizing it.
- Introduction to Machine Learning CMU - 10701 Deep Learning
- Describe the layers and the number of elements: connections/parameters
- Paper: Gradient-Based Learning Applied to Document Recognition - 1998
- Deep Learning book (Yoshua Bengio et al.) has a nice intro highlighting some interesting points.
- Stanford offers a course about CNN..with some assignments.
- Specifically Lecture 7 - Convolutional Neural Networks
Articles
- Ilya Sutskever - Brief Overview of Deep Learning
- Very critical document. All concerns to care with it when experimenting the CNN
- Yann LeCun on His Quest to Unleash Deep Learning
- Discuss wide range of thoughts around CNN
Running CNN
- Wiki list popular ones
- I am using Caffe, based on C++ and has some interfaces
- Network defined in text file, not code
- They provide many examples, including popular ones (AlexNet, GoogleNet..)
- Documentation is not perfect. Workaround: Reading these examples + Issues on github
CNN in Images
There are several important papers applied CNN in images. Following is little of them per problem.
Image Classification
- Krizhevsky et al 2012: ImageNet Classification with Deep Convolutional Neural Networks
Object Recognition
- Girshick et al 2014: Rich feature hierarchies for accurate object detection and semantic segmentation
CNN in Videos
The real challenge in videos is considering too the temporal dimension for the data. One naive way is to ignore that with cost of losing the motion information. To avoid that, there are several proposed methods to make use of both spatial and temporal data. Tomas Pfister et al is the easiest one of them (instead of feeding 1 image of 3 channels...feed k images as one image of 3k channels).
Video Classification (Action Recognition)
- Shuiwang Ji 2010 - 3D Convolutional Neural Networks for Human Action Recogniton
- Karen and Andrew 2014 - two-stream-convolutional-networks-for-action-recognition-in-videos
- Karpathy 2014: Large-scale Video Classification with Convolutional Neural Networks
Pose Estimation
- Tomas Pfister 2014. Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos
Useful Papers
- Mattew and Rob 2014 - Visualizing and Understanding Convolutional Networks
Other Materials
- Lecun10, Convolutional networks and applications in vision
Libraries
- Caffe: Deep learning is reason behind the higher push for performance in several problems. Caffe is a deep learning framework developed with cleanliness, readability, and speed in mind.
Last update: Feb 10, 2015.