Solving Linear SVM in the 'primal' formulation - visualisation
Soving SVM in primal with stochastic gradient descent has gained popularity for the huge speed gains in solving large scale classification problems.
Attached is a simple code-tutorial on how these various methods work on a toy problem.
Simple problem definition:
Given a training set x with labels y in [-1,+1],
learn a hyperplane w* that sepatrates the categories in y by the largest possible gap
When comsidering the SVM in primal form, the aim is to find
the solution w* that minimizes:
minimize 1/2||w||.^2 s.t y(i)(w'x(i) + b) >= 1, i=1...m
This formulation is especially useful for large scale problems e.g:
[1] P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan
Object Detection with Discriminatively Trained Part Based Models
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010
[2] "Pegasos-Primal Estimated sub-Gradient SOlver for SVM"
By Shwartz, Singer and Srebro : 2007
Here we compare 4 methods:
1) Brute force
2) A Naive approximate gradient descent
3) Subgradient descent [1]
4) PEGASOS [2]
Expected command line output:
--Brute force method:
W evaluated 1681 times; final energy = 3967.14
Brute force Accuracy on Training set = 93.5000 %
--Approximate gradient method:
W converged in 23 iterations; final energy = 5081.04
Approx grad desc Accuracy on Training set = 91.5000 %
--Stochastic subgradient method:
W converged in 18 iterations; final energy = 4425.02
Approx grad desc Accuracy on Training set = 92.5000 %
--Pegasos method:
W converged in 98 iterations.
Pegasos Accuracy on Training set = 93.5000 %