Welcome to Foundation of Data Science Laboratory
Welcome to Foundation of Data Science Laboratory
Assignment no. 4. 4.3: Model Evaluation Techniques
To develop skills in statistical data modeling, hypothesis testing, classification and regression
algorithms, model evaluation techniques, and hands-on exercises with the scikit-learn library.
1. Performance Metrics for Classification:
o Evaluate a classification model using confusion matrix, precision, recall, and F1-score.
2. Performance Metrics for Regression:
o Evaluate a regression model using mean squared error, mean absolute error, and R-squared.
1. Performance Metrics for Classification:
o Evaluate a classification model using confusion matrix, precision, recall, and F1-score.
We'll use the following metrics:
Confusion Matrix
Precision
Recall
F1-Score
Code for Classification Model Evaluation:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report
# Generate a classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize and train a classifier (Random Forest in this case)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
# Predict on the test data
y_pred = clf.predict(X_test)
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)
# Classification Report (includes precision, recall, F1-score)
report = classification_report(y_test, y_pred)
print("\nClassification Report:")
print(report)
OUTPUT:
Output for Classification Model:
Confusion Matrix:
[[130 15]
[ 12 143]]
Classification Report:
precision recall f1-score support
0 0.92 0.90 0.91 145
1 0.91 0.92 0.91 155
accuracy 0.91 300
macro avg 0.91 0.91 0.91 300
weighted avg 0.91 0.91 0.91 300
2. Regression Model Evaluation:
We'll use the following metrics:
Mean Squared Error (MSE)
Mean Absolute Error (MAE)
R-squared (R²)
Code for Regression Model Evaluation:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
# Generate a regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize and train a regressor (Random Forest in this case)
reg = RandomForestRegressor()
reg.fit(X_train, y_train)
# Predict on the test data
y_pred = reg.predict(X_test)
# Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
# Mean Absolute Error
mae = mean_absolute_error(y_test, y_pred)
print("Mean Absolute Error:", mae)
# R-squared
r2 = r2_score(y_test, y_pred)
print("R-squared:", r2)
Output for Regression Model:
Mean Squared Error: 278.543
Mean Absolute Error: 13.031
R-squared: 0.991
For Classification:
Confusion Matrix: Displays the true positives, true negatives, false positives, and false negatives.
Precision: The accuracy of positive predictions.
Recall: The ability of the classifier to find all the positive instances.
F1-Score: The harmonic mean of precision and recall.
For Regression:
Mean Squared Error (MSE): The average of the squares of the errors between the predicted and actual values.
Mean Absolute Error (MAE): The average of the absolute errors.
R-squared (R²): Indicates the proportion of the variance in the dependent variable that is predictable from the independent variables.