Evaluating the Iris flower dataset

LogisticRegression

The Iris flower dataset is a classic resource in machine learning and statistics. It includes 150 samples—50 each from three species: Iris setosa, Iris versicolor, and Iris virginica. Each sample records sepal length, sepal width, petal length, and petal width. Researchers often use this dataset to test and demonstrate classification and clustering algorithms. The dataset was introduced by Ronald Fisher in his 1936 paper, "The use of multiple measurements in taxonomic problems"

Here's a more detailed breakdown:

Classes:

The dataset contains three classes, representing the three iris species:

Iris setosa
Iris versicolor
Iris virginica

Features:

The dataset includes four features for each flower:

Sepal length (in centimeters)
Sepal width (in centimeters)
Petal length (in centimeters)
Petal width (in centimeters)

from sklearn.metrics import classification_report

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn.datasets import load_iris

from sklearn.metrics import confusion_matrix

from sklearn.metrics import ConfusionMatrixDisplay

import matplotlib.pyplot as plt

import numpy as np

from sklearn.datasets import load_iris

# Load a sample dataset (e.g., the Iris dataset)

iris = load_iris()

X, y = iris.data, iris.target

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print(f'X_train.shape is {X_train.shape}')

print(f'y_train.shape is {y_train.shape}')

print(f'X_test.shape is {X_test.shape}')

print(f'y_test.shape is {y_test.shape}')

# Train a classifier (e.g., Logistic Regression)

model = LogisticRegression(random_state=42, solver='liblinear')

model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test)

cm = confusion_matrix(y_test, y_pred)

disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['setosa', 'versicolor','virginica'])

TP = np.diag(cm) # True Positives for each class are on the diagonal

FP = cm.sum(axis=0) - TP # False Positives are the sum of the column minus the TP

FN = cm.sum(axis=1) - TP # False Negatives are the sum of the row minus the TP

TN = cm.sum() - (TP + FP + FN) # True Negatives for each class are all other correct predictions

print("True Positives per class:", TP)

print("True Negatives per class:", TN)

print("False Positives per class:", FP)

print("False Negatives per class:", FN)

# Generate the classification report

report = classification_report(y_test, y_pred, target_names=iris.target_names)

print('classification report')

print ('=====================')

print(report)

classification report

================================================================

precision recall f1-score support

setosa 1.00 1.00 1.00 19

versicolor 1.00 0.92 0.96 13

virginica 0.93 1.00 0.96 13

accuracy 0.98 45

macro avg 0.98 0.97 0.97 45

weighted avg 0.98 0.98 0.98 45

disp.plot()

plt.show()

Page updated

Google Sites

Report abuse