Welcome to Foundation of Data Science Laboratory
Welcome to Foundation of Data Science Laboratory
Assignment no. 4. Practical Assignments on Statistical / Algorithmic Data Modeling
Objective:
To develop skills in statistical data modeling, hypothesis testing, classification and regression
algorithms, model evaluation techniques, and hands-on exercises with the scikit-learn library.
4.2: Basics of Classification and Regression Algorithms
1. Classification Algorithm (Logistic Regression):
o Implement a logistic regression model using scikit-learn to classify the Iris dataset.
2. Regression Algorithm (Linear Regression):
o Implement a linear regression model using scikit-learn to predict house prices.
A Python program that demonstrates how to implement a logistic regression model using scikit-learn to classify the famous Iris dataset. I will also explain the program and provide expected output details.
Program:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
# Step 1: Load the Iris dataset
iris = load_iris()
X = iris.data # Features
y = iris.target # Labels
# Step 2: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Step 3: Create and train the logistic regression model
model = LogisticRegression(max_iter=200) # Increased max_iter to ensure convergence
model.fit(X_train, y_train)
# Step 4: Make predictions on the test data
y_pred = model.predict(X_test)
# Step 5: Evaluate the model's performance
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print(f"Accuracy Score: {accuracy_score(y_test, y_pred):.2f}")
Output:
Confusion Matrix:
[[16 0 0]
[ 0 13 1]
[ 0 1 14]]
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 16
1 0.93 0.93 0.93 14
2 0.93 0.93 0.93 15
accuracy 0.96 45
macro avg 0.95 0.95 0.95 45
weighted avg 0.96 0.96 0.96 45
Accuracy Score: 0.96
Confusion Matrix: Each row corresponds to the actual class, and each column corresponds to the predicted class. The diagonal values indicate correct predictions, and off-diagonal values indicate misclassifications.
Classification Report: Provides precision, recall, and F1-score for each class.
Accuracy Score: 0.96 indicates the model correctly classified 96% of the test samples.
This output demonstrates that the logistic regression model performs well on the Iris dataset.
Explaination:
Import Libraries:
scikit-learn is used for loading datasets, creating and training the model, and evaluating the results.
Load the Iris Dataset:
load_iris() loads the dataset, which consists of 4 features (sepal length, sepal width, petal length, petal width) and 3 classes (Iris-setosa, Iris-versicolor, and Iris-virginica).
Split the Dataset:
train_test_split is used to split the data into training (70%) and testing (30%) sets.
Create and Train the Model:
LogisticRegression() is used to create the model.
The max_iter parameter is set to 200 to ensure convergence during training.
fit() trains the model using the training data.
Make Predictions:
predict() makes predictions on the testing set.
Evaluate the Model:
confusion_matrix shows how well the model performs by displaying the counts of correct and incorrect predictions.
classification_report provides detailed metrics such as precision, recall, and F1-score.
accuracy_score shows the overall accuracy of the model.