A computer program is said to learn from experience E with respect to a task T and some performance measure P if performance on T, measured by P, improves with experience E
For example, refer here
Supervised learning
Unsupervised learning
Reinforcement learning
Recommender system
Some questions for strengthening your understanding
y=f(x)
Refer: https://www.youtube.com/watch?v=kHwlB_j7Hkc&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&index=4
After each iteration, find the tangent of the error curve
Simultaneously update all parameters
If the variable is in left of the local minima, then slope will be negative and so, the variable value will increase and so coming closure to local minima
If the variable is in right of the local minima, then slope will be positive and so, the variable value will decrease and so coming closure to local minima
If the variable will be at local minima, then slope will be zero and so, value will not change ideally
Slow learning rate will converge to local minima, but it will take many iteration
Very high learning rate may not converge or even diverge
No need to decrease learning rate since gradient descent will automatically take smaller steps over the time.
There are many local minima
Based on initial point, solution can converge to different local minima
Multi-variate linear regression
Linear regression is not suitable for classification problem
By luck, below algorithm predicts correctly
Adding a new training data makes prediction wrong
Hyphothesis value should be in the range of 0 and 1 for decision problems. Logistic regression method ensures this
With many parameters, the hypothesis function will have lots of features or else hypothesis will not be good fit
Machine needs to covert pixel values to the object
Andrew Ng sessions
https://www.gnu.org/software/octave/
https://www.quora.com/Why-does-Andrew-Ng’s-Machine-Learning-course-use-Octave-instead-of-R