CNN Architectures

What architectures are we going to learn today?

Image Classification: What is in the image?
1. LeNet
2. AlexNet
3. VGG16 and VGG19
4. ResNet-{x}
Object Detection: What is in the image and where is it located?
1. Faster-RCNN
Image Segmentation: What is in the image and where precisely is it located?

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, november 1998

Various Layers in LeNet:

C1: Convolutional: Kernel Size 5 X 5 (Trainable Parameters: 156)
S2: Subsampling Layer: Average Pooling + (multiplication and bias) > Sigmoid Activation (TP: 12)
.....
.....
C5: Convolutional Layer: Kernel Size 5 X 5 (Trainable Parameters: 48,120)
F6: Fully-connected Layer: (Trainable Parameters: 10,164 (84*120 + 84)
Output Layer

5 Convolutional layers and 3 Fully-Connected Layers
Actual input image size is 227 X 227 X3 (the paper and the picture above have a mistake)
Has around 60M parameters

Very similar to the LeNet, except with more Conv and Pool layers
VGG16 has 16 layers and VGG19 has 19 layers
What is the major problem with VGG net? It takes too long to train since it has many layers
Why is VGG so special? VGG seems to work better than other superior architectures for a lot of tasks (e.g. style transfer). Nobody knows why!

Deep Residual Learning for Image Recognition by Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun
Motivation: "Deeper is better" - in theory more expressiveness and better generalization. Really long neural networks might work in theory, but in practice, performance decreases after a certain point
We get around this problem by using Shortcut Branch
Interesting fact: ResNet adds zero new parameters. ResNet with 34 layers requires only 18% of the operations as a VGG with 19 layers.

ResNet Architecture --> Conv block and ID block

Improves gradient flow
Gets better performance than VGG-{x} with significantly lower number of parameters

Important Criterion:

Google Sites

Report abuse