Convolutional neural networks: Reading Guide

Deep Learning is behind state-of-the-art results in several problems cross different fields. In computer vision, Convolutional Neural Networks (CNN), one of the deep learning models, is applied to some image and video challenges. In this article, I will point out to what to read to be on the right track and get started to apply CNN to your vision problems.

For Arabian, this videos hould be helpful.

Prerequisites

Deep Learning is a Machine Learning model, so you should be aware with such field.
- The easiest start in machine learning probably is Andrew Ng Coursera Course.
  - You need at least to cover up to week 5 (Neural Networks: Learning)
  - However, It is great to finish the whole series
You need to know Neural Networks(NN), specifically Backpropagation model.
- You may learn it through course by Geoffrey Hinton himself!
- It is a good idea to write/read code for the Backpropagation Algorithm

Convolutional neural networks (CNN)

Basic Reading

Yann LeCun and Yoshua Bengio introduced the CNN. You should read their work.
- Paper: Gradient-Based Learning Applied to Document Recognition - 1998
  - Mainly Pages 5-9 + The related figures.
  - Presentation summarizing it.
- Introduction to Machine Learning CMU - 10701 Deep Learning
  - Describe the layers and the number of elements: connections/parameters
Deep Learning book (Yoshua Bengio et al.) has a nice intro highlighting some interesting points.
Stanford offers a course about CNN..with some assignments.
- Specifically Lecture 7 - Convolutional Neural Networks

Articles

Ilya Sutskever - Brief Overview of Deep Learning
- Very critical document. All concerns to care with it when experimenting the CNN
Yann LeCun on His Quest to Unleash Deep Learning
- Discuss wide range of thoughts around CNN

Running CNN

Wiki list popular ones
I am using Caffe, based on C++ and has some interfaces
- Network defined in text file, not code
- They provide many examples, including popular ones (AlexNet, GoogleNet..)
- Documentation is not perfect. Workaround: Reading these examples + Issues on github

CNN in Images

There are several important papers applied CNN in images. Following is little of them per problem.

Image Classification

Krizhevsky et al 2012: ImageNet Classification with Deep Convolutional Neural Networks

Object Recognition

Girshick et al 2014: Rich feature hierarchies for accurate object detection and semantic segmentation

CNN in Videos

The real challenge in videos is considering too the temporal dimension for the data. One naive way is to ignore that with cost of losing the motion information. To avoid that, there are several proposed methods to make use of both spatial and temporal data. Tomas Pfister et al is the easiest one of them (instead of feeding 1 image of 3 channels...feed k images as one image of 3k channels).

Video Classification (Action Recognition)

Shuiwang Ji 2010 - 3D Convolutional Neural Networks for Human Action Recogniton
Karen and Andrew 2014 - two-stream-convolutional-networks-for-action-recognition-in-videos
Karpathy 2014: Large-scale Video Classification with Convolutional Neural Networks

Pose Estimation

Tomas Pfister 2014. Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos

Useful Papers

Mattew and Rob 2014 - Visualizing and Understanding Convolutional Networks

Other Materials

Lecun10, Convolutional networks and applications in vision

Libraries

Caffe: Deep learning is reason behind the higher push for performance in several problems. Caffe is a deep learning framework developed with cleanliness, readability, and speed in mind.

Last update: Feb 10, 2015.

Google Sites

Report abuse