Alpana A. Borse

Welcome to Foundation of Data Science Laboratory

Assignment no. 4. Practical Assignments on Statistical / Algorithmic Data Modeling

Objective:

To develop skills in statistical data modeling, hypothesis testing, classification and regression

algorithms, model evaluation techniques, and hands-on exercises with the scikit-learn library.

4.2: Basics of Classification and Regression Algorithms

1. Classification Algorithm (Logistic Regression):

o Implement a logistic regression model using scikit-learn to classify the Iris dataset.

2. Regression Algorithm (Linear Regression):

o Implement a linear regression model using scikit-learn to predict house prices.

A Python program that demonstrates how to implement a logistic regression model using scikit-learn to classify the famous Iris dataset. I will also explain the program and provide expected output details.

Program:

# Import necessary libraries

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Step 1: Load the Iris dataset

iris = load_iris()

X = iris.data # Features

y = iris.target # Labels

# Step 2: Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Create and train the logistic regression model

model = LogisticRegression(max_iter=200) # Increased max_iter to ensure convergence

model.fit(X_train, y_train)

# Step 4: Make predictions on the test data

y_pred = model.predict(X_test)

# Step 5: Evaluate the model's performance

print("Confusion Matrix:")

print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")

print(classification_report(y_test, y_pred))

print(f"Accuracy Score: {accuracy_score(y_test, y_pred):.2f}")

Output:

Confusion Matrix:

[[16 0 0]

[ 0 13 1]

[ 0 1 14]]

Classification Report:

precision recall f1-score support

0 1.00 1.00 1.00 16

1 0.93 0.93 0.93 14

2 0.93 0.93 0.93 15

accuracy 0.96 45

macro avg 0.95 0.95 0.95 45

weighted avg 0.96 0.96 0.96 45

Accuracy Score: 0.96

Confusion Matrix: Each row corresponds to the actual class, and each column corresponds to the predicted class. The diagonal values indicate correct predictions, and off-diagonal values indicate misclassifications.
Classification Report: Provides precision, recall, and F1-score for each class.
Accuracy Score: 0.96 indicates the model correctly classified 96% of the test samples.

This output demonstrates that the logistic regression model performs well on the Iris dataset.

Explaination:

Import Libraries:

scikit-learn is used for loading datasets, creating and training the model, and evaluating the results.

Load the Iris Dataset:

load_iris() loads the dataset, which consists of 4 features (sepal length, sepal width, petal length, petal width) and 3 classes (Iris-setosa, Iris-versicolor, and Iris-virginica).

Split the Dataset:

train_test_split is used to split the data into training (70%) and testing (30%) sets.

Create and Train the Model:

LogisticRegression() is used to create the model.
The max_iter parameter is set to 200 to ensure convergence during training.
fit() trains the model using the training data.

Make Predictions:

predict() makes predictions on the testing set.

Evaluate the Model:

confusion_matrix shows how well the model performs by displaying the counts of correct and incorrect predictions.
classification_report provides detailed metrics such as precision, recall, and F1-score.
accuracy_score shows the overall accuracy of the model.

Page updated

Google Sites

Report abuse