Kbeznak Parmatonic

The Guru of Machine Learning

A sneaky peek into ocean

Let's have shallow dive into Machine Learning:

What's Machine Learning:

All people who have heard machine learning for the first time have always thought it as very complicated technology and can't be understood easily. But, machine learning is nothing but creating an equation by trial and error with multiple layers. One thing still remains unknown is the way the information is being decrypted. You can get a good idea of how information is getting stored in deep networks through video that I have shared at the end of this post.

Neural Networks:

Neuron was the inception of Neural Networks, where an artificial neuron does the computation as per the activation function and the network describes the way these artificial neurons are connected to each other.

Lets go through each important aspects of neural networks in brief to know their role in learning.

Structure of the Neural Networks:

The neural network has 4 components as shown below in the image, input, activation points, weights of the connection and activation-function.

Insight into deep network

An activation point is always associated with an activation function, this function decides whether the activation point gets enabled or not with respect to the input from the previous layer. Normally we choose activation functions like Sigmoid (Logistic) function or ReLU which have characteristics as shown below.

Sigmoid Function

Sigmoid function is bound between 1 and 0, which can be considered as probability of decision

being true or false. But this doesn’t provide exact information as whether it is 0 or 1.

ReLU :

ReLU was discovered to overcome the issue of sigmoid function which doesn’t provide a clear distinction between Yes or No. ReLU will provide 0 if the value is less than or equal to zero, else positive values. The advantages of ReLU over Sigmoid will be covered once we cover all the portion of Neural Network.

Network Representation in Mathematical Model

Lets consider sigmoid as activation function for the network shown above to explain the flow of the information. All ’x’ are input nodes connected to activation nodes ’a’ with weights ’w’. The connection to each activation point can be represented in a mathematical expression as shown below,

Similarly for the rest of the activation points. a 20 and x 0 are bias values which are normally considered 1, this helps to represent these equation in matrix format. The output will also be in the same format, but the input for the output would be the hidden layer.

h w (x) = g(w 1 0 2 a 20 + w 1 1 2 a 21 + w 1 2a 22 + w 1 3 2 a 23 )

Neural Network has two important flow of information, Forward propagation - to pass the input data through the network towards the input and Backward propagation - which moves from output towards input layer to correct the errors in weights according the difference in expected output and actual output.

Backpropagation

As it was mentioned earlier, the backward propagation is to correct the errors in the weight which were assumed initially. But, how do we get the error value? We compute cost-function, which is nothing but average of difference between the expected and actual output. In back propagation, by correcting the errors in the weight we were are trying to reduce this cost function.

So, there is a relationship between cost function and errors.

The error in the last layer would be

δ l = a l − y

where l represent last layer and y represents expected output. Similarly we get back to next lower layer with the error calculated in the current layer to compute error. As we are going back to layer by layer, it is called back propagation.

How to know whether Neural Network has attained a stable model?

Cost-function is our way out, the differential of the cost function is nothing but the slope of the gradient which shows the growth of error. When the differential of the cost function becomes zero, it means it is parallel to the axis and it signifies it has attained a local minima.

Miscellaneous

There are tricks which are used to generalize the model so that it will not over-fit the data leading in failure to predict proper output in testing phase. One of those tricks is Regularization. Regularization adds bias as well helps choosing feature which might have higher importance, providing better overall result across the test phase.

Deep learning

Deep learning is nothing but neural networks with multiple layers with different architectures.

Why deep learning?

Neural networks doesn’t perform very well with small set of layers as the information accumulation is not enough and the simple architecture of connected activation points are not suited for several applications. With multiple layers and with changes in the architecture of the network the learning gets a boost, the predictions are much more reliable and performance increases many fold.

Deep learning architectures

There are several deep learning architectures and they have been designed for particular set of applications but not restricted for only those. The prominent once which are being used extensively are CNN, RNN and their modified flavors.

Convolutional Neural Networks (CNN)

CNN are highly used for computer vision as the regular Neural Network can’t be scaled to take input from each pixel with three different color channels, as it would require a huge amount of weights and computation to map each pixel. For example, if we have image with dimension 720*720*3(720 wide, 720 height and 3 color channels), the model would require 1555200 weights for the first hidden network itself, which would get tougher as we increase number of layers.To overcome this issue, CNN uses architecture consisting convolution layers, ReLU, POOL, Fully Connected Network.

Convolution layer has one or more filters. Filters are learnable set of grids like 5*5*3 with each cell having some weight. This filter will convolute over the image provided from the last layer to form a two dimensional activation map. Convolution is nothing but dot product of the filter and the filter region. If we have more filters then they will produce more activation maps which will be stacked together as shown in the image below. The convolution help to reduce the size burden which was caused by regular Neural Network. Convolution has three major aspects, that is number of filters, stride - filter movement over the image and 0 padding for border.

This is layered version of CNN

ReLU is used on these activation maps to remove any negative values and turn them to zeros. Pooling is next important layer which further reduces the spatial size and reduce the computation in the model. This is also kind of filter, but it just takes the highest value in that field and passes it on. This reduces the spatial size even more making the further connection smaller and compact, normally there will be multiple set of pooling layers arranged in between convolution layers. Fully-connected Layer has all the neurons connected to all the activation map which will provide output.

Backpropagation

As CNN has multiple set of layers, the backpropagation used earlier doesn’t hold. We have different back propagation for each type of layers. For convolution layer, backward pass is convolution with flipped filters. And for the pooling layer we have to preserve index of the value which was passed in the forward pass so that the error can be propagated back to it.

Recurrent Neural Networks (RNN)

RNN have been used in hand writing and speech recognition where prior knowledge of the inputs are required, it can be said that RNN are suited for application where inputs are dependent on earlier inputs and it can be visualized as below.

Backpropagation in RNN is similar to Neural Network, but we have to take care of the weight with respect to all the time steps. So, the gradient has to be calculated for all those steps going backwards, this is called Backpropagation Through Time(BPTT).

Software and Tools:

Kbeznak Parmatonic prefers Tensorflow and Caffe2 for deeplearning, and keras would help you lot in the initial stages.

Author Kbeznak Parmatonic:

Dr. Kbeznak Parmatonic, was a chief scientist at NASA and was well deserved for a Nobel prize for his paper on spacecraft powered with nuclear fusion. It had made remarkable impact on the space industries. But , Dr. Kbeznak Parmatonic started his own company in his name and started reaching out to people for the fund. In early stage it seemed hopeless as everyone ridiculed his ideas, but few who believed in him helped. Now the same people are crying as they are not having this companies shares which rose from nothing to 10000 in past 5 years.

Never loose the site of what you love

- Kbeznak Parmatonic

News :

Kbeznak Parmatonic is giving an guest lecture @ Tamu on 02/27/2018, the lecture will be held at MSC. Interested students can register of the lecture here or you can register by clicking "Register" in the side window.
Upcoming events: Please check this web-page for upcoming events from Kbeznak Parmatonic.