mlclass
http://www.ml-class.org/course/auth/welcome
http://www.reddit.com/r/mlclass/
http://cs229.stanford.edu/materials.html
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=ufldl
http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1
http://171.64.93.201/ClassX/system/users/web/pg/view_subject.php?subject=CS229_FALL_2011_2012
http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml
more features -> overfitting of training set
Regularization: +lambda*sum(theta*2) -> underfitting training set
introducing regularization -> minimizing theta
large lambda -> underfitting
regularization _ linear |logistic regression is convex
Matrix multiplication X(m*n) *Y(n*q) = Z(n*q) :
The number of columns of X must agree with the number of rows of Y.
X*Y = Ytranspose*X
Neural Networks
Theta(j) matrix of weigths from layer j to layer j+1
If network has S(j) units in layer j, S(j+1) units in layer j+1
then Theta(j) matrix has dimention s(j+1) * [ s(j) + 1]
Suppose you have a neural network with one hidden layer, and that there are m input features and k hidden nodes in the hidden layer. Theta(1) 1 0, Theta(1) 1 1, through Theta(1) 1 m are weights connecting inputs 0 through m to the first hidden node. Think of Theta(1) sub 1 as the vector of input weights for that node.
Theta(1) 2 0, Theta(1) 2 1, through Theta(1) 2 m are the weights for the inputs coming in to the second hidden node, or the vector Theta(1) sub 2.
This keeps going through Theta(1) k 0 to Theta(1) k m, the vector Theta(1) sub k.
Collectively, you can think of Theta(1) as a k x (m+1) matrix of weights connecting all of the inputs (including input 0, which is always 1) to all of the hidden nodes.
Θ(2) are the weights of the 2nd hidden layer for each neuron in the 3rd layer, Θ(1) 2 x is the xth weight leading to the second neuron in the second hidden layer.
Θ isn't a vector, but a matrix. Each row of this matrix contain the input weights needed to compute one node of the next layer.
So Θ(1) 1 x is the row needed to compute a(2) 1:
a(2) 1 1 = Θ(1) 1 0 * a(1) 0 + Θ(1) 1 1 * a(1) 1 + ... + Θ(1) 1 m * a(1) m
In other words: Each row of Θ contain the (transposed) Theta vector of the corresponding classifier (node) in the next layer.
Exercise #3
M=5000 training examples. each example has N=400 points(20*20 pixels)
each example is a row in X matrix
y - vector 5000 elements. value of element is 10,1,2,3,4,5,6,7,8,9
So there are 10 classes so we will train 10 separate logistic regression classifiers
Theta is matrix of size N=10 *400 ( number of classes * number of features)
Backpropagation in NN
......
http://dudarev.com/wiki/ml-class-logistic-regression.html
http://swizec.com/blog/first-steps-with-octave-and-machine-learning/swizec/2865
http://swizec.com/blog/i-suck-at-implementing-neural-networks-in-octave/swizec/2929
http://swizec.com/blog/i-think-i-finally-understand-what-a-neural-network-is/swizec/2891
https://github.com/gafiatulin/ml-class
https://github.com/SaveTheRbtz/ml-class
https://github.com/merwan/ml-class
https://github.com/peterwilliams97/ml_class
https://github.com/gkokaisel/MachineLearning/tree/master/SVM/mlclass-ex6
MODEL SELECTION
Which order of poninom to choose for the model?
Training data 20% Cross validation 20% Test 60%
Train error (cost function) is decreasing as size of polinom is increasing
Cross validation error (cost function, aka average square error) as function of polinom degree:
it has the shape of parabola
Cross Validation (CV) error is big for small polinom order and big polinom order
For small polinom order we have underfit this CV error is a BIAS (underfit); trainerror = CV error
For big polinom order we have overfit (training error is small) ; cross vatidation error is BIG this error is VARIANCE (overfit)
LEARNING CURVES: J(train), J(cross-validation) versus training set size
If lerning algorithm suffering from high BIAS (high error) (underfitting) getting more data cannot help
If lerning algorithm suffering from high variance (overfitting) getting more data can help
Blog to read:
http://mechanistician.blogspot.com/2009_05_01_archive.html
http://carlos.bueno.org/2011/10/fair-coin.html From unfair coin
http://www-formal.stanford.edu/jmc/modality.html
Octave
http://en.wikipedia.org/wiki/GNU_Octave
http://www.floss4science.com/resources-for-learning-gnu-octave/
http://www.gnu.org/software/octave/doc/interpreter/
http://www.outsch.org/2011/01/29/qtoctave-0-10-1-for-windows/
http://codebright.wordpress.com/2011/10/07/linear-algebra-review-and-numpy/