LogisticRegression
The Iris flower dataset is a classic resource in machine learning and statistics. It includes 150 samples—50 each from three species: Iris setosa, Iris versicolor, and Iris virginica. Each sample records sepal length, sepal width, petal length, and petal width. Researchers often use this dataset to test and demonstrate classification and clustering algorithms. The dataset was introduced by Ronald Fisher in his 1936 paper, "The use of multiple measurements in taxonomic problems"
Here's a more detailed breakdown:
Classes:
The dataset contains three classes, representing the three iris species:
Iris setosa
Iris versicolor
Iris virginica
Features:
The dataset includes four features for each flower:
Sepal length (in centimeters)
Sepal width (in centimeters)
Petal length (in centimeters)
Petal width (in centimeters)
from sklearn.metrics import classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_iris
# Load a sample dataset (e.g., the Iris dataset)
iris = load_iris()
X, y = iris.data, iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print(f'X_train.shape is {X_train.shape}')
print(f'y_train.shape is {y_train.shape}')
print(f'X_test.shape is {X_test.shape}')
print(f'y_test.shape is {y_test.shape}')
# Train a classifier (e.g., Logistic Regression)
model = LogisticRegression(random_state=42, solver='liblinear')
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['setosa', 'versicolor','virginica'])
TP = np.diag(cm) # True Positives for each class are on the diagonal
FP = cm.sum(axis=0) - TP # False Positives are the sum of the column minus the TP
FN = cm.sum(axis=1) - TP # False Negatives are the sum of the row minus the TP
TN = cm.sum() - (TP + FP + FN) # True Negatives for each class are all other correct predictions
print("True Positives per class:", TP)
print("True Negatives per class:", TN)
print("False Positives per class:", FP)
print("False Negatives per class:", FN)
# Generate the classification report
report = classification_report(y_test, y_pred, target_names=iris.target_names)
print('classification report')
print ('=====================')
print(report)
classification report
================================================================
precision recall f1-score support
setosa 1.00 1.00 1.00 19
versicolor 1.00 0.92 0.96 13
virginica 0.93 1.00 0.96 13
accuracy 0.98 45
macro avg 0.98 0.97 0.97 45
weighted avg 0.98 0.98 0.98 45
disp.plot()
plt.show()