Classification

Lets have try to create a SVM and use that for classification purpose. Hope the basics of neural networks are clear for you.

In this problems lets use a standard data set- iris, in R which contains species of flowers -Iris setosa, versicolor, and virginica. (See flowers here) The iris data set gives the measurements in centimetres of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. Total data contains 150 rows altogether.

We will use the parameters -sepal length and width and petal length and width etc to predict the species of flowers.

See

dir(iris)

Now check

print (iris.feature_names)

print (iris.target_names)

print (iris.target)

To do SVM analysis let us take a sample data from the data set iris.

>>> from sklearn.datasets import load_iris >>> iris = load_iris() >>> X = iris.data >>> y = iris.target

It is trivial to train a classifier once the data has this format. A support vector machine (SVM), for instance, with a linear kernel:

>>> from sklearn.svm import LinearSVC >>> LinearSVC() LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True, intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2', random_state=None, tol=0.0001, verbose=0) >>> clf = LinearSVC

clf is a statistical model that has hyperparameters that control the learning algorithm. Those hyperparameters can be supplied by the user in the constructor of the model.

By default the real model parameters are not initialized. The model parameters will be automatically tuned from the data by calling the fit() method:

>>> clf = clf.fit(X,y) >>> clf.coef_ array([[ 0.18423926, 0.45122784, -0.80794282, -0.45071669], [ 0.04497896, -0.87929491, 0.40550806, -0.94073588], [-0.85087141, -0.98678139, 1.38086854, 1.8653876 ]]) >>> clf.intercept_ array([ 0.10956135, 1.67443781, -1.70981491])

Once the model is trained, it can be used to predict the most likely outcome on unseen data.

Let's try with iris.data, using the last sample:

>>> iris.data array([[ 5.1, 3.5, 1.4, 0.2], [ 4.9, 3. , 1.4, 0.2], [ 4.7, 3.2, 1.3, 0.2], ... [ 6.5, 3. , 5.2, 2. ], [ 6.2, 3.4, 5.4, 2.3], [ 5.9, 3. , 5.1, 1.8]]) >>> X_new = [[ 5.9, 3. , 5.1, 1.8]] >>> clf.predict(X_new) array([2]) >>> iris.target_names array(['setosa', 'versicolor', 'virginica'], dtype='|S10')

The result is 2, and the id of the 3rd iris class, namely 'virginica'.

Page updated

Google Sites

Report abuse