ImageNet Classification with Deep Convolutional Neural Networks, 2012
5 Convolutional layers and 3 Fully-Connected Layers
Actual input image size is 227 X 227 X3 (the paper and the picture above have a mistake)
Has around 60M parameters
VGG
Very similar to the LeNet, except with more Conv and Pool layers
VGG16 has 16 layers and VGG19 has 19 layers
What is the major problem with VGG net? It takes too long to train since it has many layers
Why is VGG so special? VGG seems to work better than other superior architectures for a lot of tasks (e.g. style transfer). Nobody knows why!
ResNet
Deep Residual Learning for Image Recognition by Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun
Motivation: "Deeper is better" - in theory more expressiveness and better generalization. Really long neural networks might work in theory, but in practice, performance decreases after a certain point
We get around this problem by using Shortcut Branch
Interesting fact: ResNet adds zero new parameters. ResNet with 34 layers requires only 18% of the operations as a VGG with 19 layers.
ResNet Architecture --> Conv block and ID block
Improves gradient flow
Gets better performance than VGG-{x} with significantly lower number of parameters