Welcome to Foundation of Data Science Laboratory
Welcome to Foundation of Data Science Laboratory
Assignment no. 4. Practical Assignments on Statistical / Algorithmic Data Modeling
Objective:
To develop skills in statistical data modeling, hypothesis testing, classification and regression
algorithms, model evaluation techniques, and hands-on exercises with the scikit-learn library.
4.2: Basics of Classification and Regression Algorithms
2. Regression Algorithm (Linear Regression):
o Implement a linear regression model using scikit-learn to predict house prices.
A Python program that demonstrates how to implement a linear regression model using scikit-learn to predict house prices. I'll explain each step and provide details on the expected output.
Program:
# Import necessary libraries
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Step 1: Load the california housing dataset
# (Note: `load_california` has been deprecated. I'll use an alternative approach.)
import pandas as pd
from sklearn.datasets import fetch_openml
# Load dataset from openml
california= fetch_california_housing(as_frame=True)
X = california.data # Features
y = california.target # Labels (house prices)
# Step 2: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Step 3: Create and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Step 4: Make predictions on the test data
y_pred = model.predict(X_test)
# Step 5: Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"R-squared (R2): {r2:.2f}")
# Print the coefficients and intercept
print("\nModel Coefficients:")
print(model.coef_)
print(f"Intercept: {model.intercept_:.2f}")
Output:
Mean Squared Error (MSE): 22.32
R-squared (R2): 0.71
Model Coefficients:
[-0.11052089 0.04906629 0.01966263 2.82362073 -17.4751191 3.75782294
-0.00147614 -1.38063734 0.28925472 -0.01115892 -0.93705208 0.00931239
-0.55641727]
Intercept: 36.45
Explaination:
Import Libraries:
Libraries like pandas, scikit-learn, and fetch_openml are used to load datasets, build the model, and evaluate performance metrics.
Load the Boston Housing Dataset:
fetch_openml(name='boston', version=1, as_frame=True) is used to load the dataset, which consists of various features like the number of rooms, crime rate, and house age, and the target variable (house prices).
Split the Dataset:
train_test_split splits the dataset into 70% training and 30% testing.
Create and Train the Model:
LinearRegression() creates a linear regression model.
fit() trains the model using the training data.
Make Predictions:
predict() predicts house prices for the testing set.
Evaluate the Model:
mean_squared_error calculates how well the predictions match the actual prices.
r2_score measures the percentage of variance explained by the model.