[TBD]Understanding Convolution Neural Network (CNN) for multimedia

Introduction

Laymen explanation

Technical explanation

Jargons

Receptive fields

Loss function

Useful link: https://medium.com/data-science-group-iitr/loss-functions-and-optimization-algorithms-demystified-bb92daff331c

Stride

Stride is the amount by which the kernel is moved by as the kernel is passed over the image. In other words, It is the step by which kernel should move in the channel

Components

Before explaining components, below is the CNN architecture

Filters

Feature map

Convolutional Layer

The most important building block of a CNN is the convolutional layer. neurons in the first convolutional layer are not connected to every single pixel in the input image, but only to pixels in their receptive fields. Below picture shows rectangular receptive field.

This architecture allows the network to concentrate on low-level features in the first hidden layer, then assemble them into higher-level features in the next hidden layer, and so on.

A convolutional layer simultaneously applies multiple filters to its inputs, making it capable of detect‐ ing multiple features anywhere in its inputs.

Pooling layer

Their goal is to subsample (i.e., shrink) the input image in order to reduce the computational load, the memory usage, and the number of parameters(thereby limiting the risk of overfitting).

Flattening

Fully Connected Layer

The layer we call as FC layer, we flattened our matrix into vector and feed it into a fully connected layer like a neural network.

CNN flow

Provide input image into convolution layer
Choose parameters, apply filters with strides, padding if requires. Perform convolution on the image and apply ReLU activation to the matrix.
Perform pooling to reduce dimensionality size
Add as many convolutional layers until satisfied
Flatten the output and feed into a fully connected layer (FC Layer)
Output the class using an activation function (Logistic Regression with cost functions) and classifies images.

Challenges

The convolutional layers require a huge amount of RAM, especially during training, because the reverse pass of backpropagation requires all the intermediate values computed during the forward pass.

Tools

Online calculator for Sigmoid function

Refer: https://keisan.casio.com/exec/system/15157249643325

Python library

Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

Useful link: https://github.com/krdpk17/mlbr/blob/master/CS231n-python-numpy-tutorial.ipynb

https://github.com/machinelearningblr/machinelearningblr.github.io/blob/master/tutorials/CS231n-Materials/CS231n-python-numpy-tutorial.ipynb

Tips

Difference between various random generators (random, uniform, rand) in numpy

Refer https://stackoverflow.com/questions/30762832/difference-between-functions-generating-random-numbers-in-numpy

Reference

https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6

https://images.app.goo.gl/zJkMvF4wmDN2fVbk6

https://images.app.goo.gl/fDq7PHQuZdbfRgdY8

https://images.app.goo.gl/YFgHsU8MkTqCoEek9

https://images.app.goo.gl/ijJkorwQFY8PWkJF6

https://images.app.goo.gl/Lyeae8TQKDU68HgGA

https://www.quora.com/What-does-stride-mean-in-the-context-of-convolutional-neural-networks

https://images.app.goo.gl/dfs9bihG2wcRN9Xt5

https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148

Page updated

Google Sites

Report abuse