[TBD]Role of neural networks in data science

Introduction

Laymen explanation

Technical explanation

Basic building blocks

Perceptron

It is a linear threshold unit (LTU): the inputs and output are numbers (instead of binary on/off values) and each input connection is associated with a weight(Refer: below diagram). The LTU computes a weighted sum of its inputs (z = w1 x1 + w2 x2 + ⋯ + wn xn = wT · x),
Step function(Refer: below diagram) is applied in a perceptron (to that sum) and outputs the result: hw(x) = step (z) = step (wT · x).

A Perceptron is simply composed of a single layer of LTUs,6 with each neuron con‐ nected to all the inputs (Refer: below diagram)

Multi-Layer Perceptron

An MLP is composed of one (passthrough) input layer, one or more layers of LTUs, called hidden layers, and one final layer of LTUs called the output layer (see Figure 10-7). Every layer except the output layer includes a bias neuron and is fully connected to the next layer.

Feed forward neural network

This is the neural network where the activations flow only in one direction, from the input layer to the output layer. Refer here for the detail.

Recurrent neural network

A recurrent neural network looks very much like a feedforward neural network, except it also has connections pointing backward.

Deep neural network

When an ANN has two or more hidden layers, it is called a deep neural network (DNN).

Property of loss/cost function

Loss function of neural networks are non-convex (Verify)

Reason behind ML training convergence

Refer here for the detail understanding of what cause ML training to converge to minima

Why neural network is so popular?

ANNs frequently outperform other ML techniques on very large and complex problems

What made neural network regain attention?

There is now a huge quantity of data available to train neural networks,
The tremendous increase in computing power since the 1990s now makes it pos‐ sible to train large neural networks in a reasonable amount of time.
The training algorithms have been improved. For example, use of ReLU instead of Sigmoid which suffered to vanishing gradient problem(verify if it is right)

Attaining global minima

Refer here for understanding condition for achieving global minima

Neural network design decision points

The number of layers
- For many problems you can start with just one or two hidden layers and it will work just fine
- For more complex problems, you can gradually ramp up the number of hidden layers, until you start overfitting the training set.
The number of neurons per layer,
The type of activation function to use in each layer and seed value
- For the hidden layers, In most cases you can use the ReLU activation function(or one of its variants).
- For the output layer, the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive). For regression tasks, you can simply use no activation function at all.
The weight initialization logic
Mini batch size
Number of epochs

Examples

Predict housing price

Reference

https://www.quora.com/How-can-you-prove-that-the-loss-functions-in-Deep-Neural-nets-are-non-convex

https://www.amazon.in/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291

https://www.linkedin.com/posts/dpkumar_convergence-machinelearningmodels-datasciences-activity-6769803533836521472-rjnu

https://images.app.goo.gl/Gp6ZN6v2vgPB8f9z7

https://youtu.be/jTzJ9zjC8nU

Page updated

Google Sites

Report abuse