Alpana A. Borse

Welcome to Foundation of Data Science Laboratory

Assignment no. 4. 4.3: Model Evaluation Techniques

To develop skills in statistical data modeling, hypothesis testing, classification and regression

algorithms, model evaluation techniques, and hands-on exercises with the scikit-learn library.

1. Performance Metrics for Classification:

o Evaluate a classification model using confusion matrix, precision, recall, and F1-score.

2. Performance Metrics for Regression:

o Evaluate a regression model using mean squared error, mean absolute error, and R-squared.

1. Performance Metrics for Classification:

o Evaluate a classification model using confusion matrix, precision, recall, and F1-score.

1. Classification Model Evaluation:

We'll use the following metrics:

Confusion Matrix
Precision
Recall
F1-Score

Code for Classification Model Evaluation:

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import confusion_matrix, classification_report

# Generate a classification dataset

X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train a classifier (Random Forest in this case)

clf = RandomForestClassifier()

clf.fit(X_train, y_train)

# Predict on the test data

y_pred = clf.predict(X_test)

# Confusion Matrix

cm = confusion_matrix(y_test, y_pred)

print("Confusion Matrix:")

print(cm)

# Classification Report (includes precision, recall, F1-score)

report = classification_report(y_test, y_pred)

print("\nClassification Report:")

print(report)

OUTPUT:

Output for Classification Model:

Confusion Matrix:

[[130 15]

[ 12 143]]

Classification Report:

precision recall f1-score support

0 0.92 0.90 0.91 145

1 0.91 0.92 0.91 155

accuracy 0.91 300

macro avg 0.91 0.91 0.91 300

weighted avg 0.91 0.91 0.91 300

2. Regression Model Evaluation:

We'll use the following metrics:

Mean Squared Error (MSE)
Mean Absolute Error (MAE)
R-squared (R²)

Code for Regression Model Evaluation:

from sklearn.datasets import make_regression

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestRegressor

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Generate a regression dataset

X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train a regressor (Random Forest in this case)

reg = RandomForestRegressor()

reg.fit(X_train, y_train)

# Predict on the test data

y_pred = reg.predict(X_test)

# Mean Squared Error

mse = mean_squared_error(y_test, y_pred)

print("Mean Squared Error:", mse)

# Mean Absolute Error

mae = mean_absolute_error(y_test, y_pred)

print("Mean Absolute Error:", mae)

# R-squared

r2 = r2_score(y_test, y_pred)

print("R-squared:", r2)

Output for Regression Model:

Mean Squared Error: 278.543

Mean Absolute Error: 13.031

R-squared: 0.991

Explanation of Metrics:

For Classification:

Confusion Matrix: Displays the true positives, true negatives, false positives, and false negatives.
Precision: The accuracy of positive predictions.
Recall: The ability of the classifier to find all the positive instances.
F1-Score: The harmonic mean of precision and recall.

For Regression:

Mean Squared Error (MSE): The average of the squares of the errors between the predicted and actual values.
Mean Absolute Error (MAE): The average of the absolute errors.
R-squared (R²): Indicates the proportion of the variance in the dependent variable that is predictable from the independent variables.

Page updated

Google Sites

Report abuse