2.2 Iris Classification

# dataset
fromsklearnimportdatasets
# plot
importseabornassns
sns.set_style("darkgrid")
importmatplotlib.pyplotasplt
# data manipulation
importpandasaspd
importnumpyasnp
# To display image in the jupyter notebook cell
%matplotlibinline

Question

In the last section, we learn about perceptron. How to use it to classify data? To answer this question, let's look at an example first: there is a data set which consists of two species of Iris. Our questions are:

  • How to classify those two species?
  • When given a new Iris, how to predict which Iris category it belongs to?

Data Set

The data set we will use is modified from the Iris dataset which can be found on the UCI Machine Learning Repository. It consists of 50 samples from each of two species of Iris (Iris setosa, Iris virginica). Two features were measured from each sample: the length of the sepals and petals, in centimeters.

Let's get a close look at the dataset.

# read csv file
iris_dataset= pd.read_csv("iris_data.csv")
# show the first 5 lines
iris_dataset.head()

The columns in this dataset are:

  • sepal length in cm
  • petal length in cm
  • Species
    • target_class 1: setosa
    • target_class -1: versicolor

Image source 1 2

Classification

The following graph showing the measured sepal length and sepal length of Iris.

sns.lmplot("sepal length (cm)", "petal length (cm)", data=iris_dataset,fit_reg=False,hue="target_class")

I believe you can classify those two species category immediately with your eye. Do you remember our question: identify which category a new Iris belongs to? Suppose the new Iris' sepal length is 5.5 cm and petal length is 3.2. We draw this new Iris point to the plot.

new_iris_dataset= iris_dataset.append([{"sepal length (cm)":5.5, "petal length (cm)":3.2,"target_class":"unknown"}])

sns.lmplot("sepal length (cm)", "petal length (cm)", data=new_iris_dataset,fit_reg=False,hue="target_class")

The unknown dot is the new Iris. Obviously, it belongs to versicolor specie (target_clss=-1).


Linear Classifier

This task is easy for us. We can classify them by our eyes. However, the computer doesn't have eyes. How can they split those two groups?

You may have noticed that this is a two-dimensional plane, and if want to separate the two classes, we can put a line between them. It looks like the following plot.

x= np.linspace(4,8,18)
a= 0.1
b= 2
y= a*x+b
sns.lmplot("sepal length (cm)", "petal length (cm)", data=iris_dataset,fit_reg=False,hue="target_class")
plt.plot(x, y, '-r')

All the points above the line are setosa category(-1), and all the points below the line are versicolor category(1). When a new Iris comes, a computer can predict which category it falls into by deciding whether it is above or below the line.

Therefore, if we can teach the computer to find such a kind of line, we can solve our problem! Let's do it.

A line in a two-dimensional plane can be represented as follows:

Let,

Then it becomes:

let n = 2:

Implement a perceptron

To implement a perceptron, we manually choose suitable ​, and ​ parameters for it.

Let,

Then the line is:

The perceptron act like this:

x_1= np.linspace(4,8,18)
w_0= 2
w_1= 0.1
w_2= -1
x_2= a*x_1+b
sns.lmplot("sepal length (cm)", "petal length (cm)", data=iris_dataset,fit_reg=False,hue="target_class")
plt.plot(x_1, x_2, '-r')

Here, we define a function perceptron which takes x_1,x_2 ​ as inputs.

defperceptron(x1,x2):
   w0= 2
   w1= 0.1
   w2= -1
   
   value= w0+w1*x1+w2*x2
   
   ifvalue>0:
       return-1
   else:
       return1

Now, let use it to predict the new Iris data with 5.1 sepal length and 1.4 petal length.

perceptron(5.1,1.4)
-1


The output is right. We build a perceptron!

In this section, we learn about how to use the perceptron to classify Iris data set and implement a simple perceptron.

So far, our perceptron is not smart at all, because we manually choose the parameters of the linear classifier. Later in the course, one of the key things to understand is how to teach them automatically learn the appropriate parameters from the data.

Reference