Welcome to Foundation of Data Science Laboratory
Welcome to Foundation of Data Science Laboratory
Assignment no. 4. 4. Hands-on Exercises with scikit-learn Library
2. Implement a Random Forest Regressor:
o Train and evaluate a Random Forest Regressor on the Boston Housing dataset.
A Python program to train and evaluate a Random Forest Regressor on the Boston Housing dataset.
The Boston Housing dataset has been deprecated in scikit-learn due to ethical concerns, but we can load it from other sources such as sklearn.datasets.fetch_openml. Here's how you can implement the model using an alternative data-fetching method.
# Import necessary libraries
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
# Load the Boston Housing dataset using fetch_openml
boston = fetch_openml(name='boston', version=1, as_frame=True)
X, y = boston.data, boston.target
# Split the data into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize the Random Forest Regressor
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
# Train the regressor
rf_regressor.fit(X_train, y_train)
# Predict the test set results
y_pred = rf_regressor.predict(X_test)
# Evaluate the performance
# 1. Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
# 2. Mean Absolute Error
mae = mean_absolute_error(y_test, y_pred)
print("Mean Absolute Error:", mae)
# 3. R-squared Score
r2 = r2_score(y_test, y_pred)
print("R-squared:", r2)
Output Example:
Mean Squared Error: 9.2658
Mean Absolute Error: 2.0345
R-squared: 0.877
Mean Squared Error (MSE): Measures the average of the squares of the errors between the predicted and actual values.
Mean Absolute Error (MAE): Measures the average of the absolute errors between the predicted and actual values.
R-squared (R²): Indicates how well the model's predictions match the true data. Values closer to 1 indicate a better fit.
This program trains the Random Forest Regressor on the Boston Housing dataset and evaluates it using MSE, MAE, and R² score. The output shows the performance of the trained model on the test set.