Solving Linear SVM in the 'primal' formulation - visualisation

Soving SVM in primal with stochastic gradient descent has gained popularity for the huge speed gains in solving large scale classification problems.

Attached is a simple code-tutorial on how these various methods work on a toy problem.

Simple problem definition:

Given a training set x with labels y in [-1,+1],

learn a hyperplane w* that sepatrates the categories in y by the largest possible gap

When comsidering the SVM in primal form, the aim is to find

the solution w* that minimizes:

minimize 1/2||w||.^2 s.t y(i)(w'x(i) + b) >= 1, i=1...m

This formulation is especially useful for large scale problems e.g:

[1] P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan

Object Detection with Discriminatively Trained Part Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010

[2] "Pegasos-Primal Estimated sub-Gradient SOlver for SVM"

By Shwartz, Singer and Srebro : 2007

Here we compare 4 methods:

1) Brute force

2) A Naive approximate gradient descent

3) Subgradient descent [1]

4) PEGASOS [2]

Expected command line output:

--Brute force method:

W evaluated 1681 times; final energy = 3967.14

Brute force Accuracy on Training set = 93.5000 %

--Approximate gradient method:

W converged in 23 iterations; final energy = 5081.04

Approx grad desc Accuracy on Training set = 91.5000 %

--Stochastic subgradient method:

W converged in 18 iterations; final energy = 4425.02

Approx grad desc Accuracy on Training set = 92.5000 %

--Pegasos method:

W converged in 98 iterations.

Pegasos Accuracy on Training set = 93.5000 %