Software Automation
Scikit - Linear Regression

Objective

By the end of this lab, you will be able to:

Apply linear regression using scikit-learn.
Interpret predictions made by a trained regression model.

Part 1: Visualising the Data

Before performing any regression, it’s important to understand the data visually.

Step 1: Import Libraries

import numpy as np

import matplotlib.pyplot as plt

Step 2: Define the Data

x = np.array([2, 4, 6, 8, 10, 12, 14, 16])

y = np.array([1, 3, 5, 7, 9, 11, 13, 15])

Step 3: Plot the Data

plt.figure(figsize=(8, 6))

plt.scatter(x, y, color='blue', label='Data Points')

plt.title('Scatter Plot of Linear Data')

plt.xlabel('x')

plt.ylabel('y')

plt.grid(True)

plt.legend()

plt.show()

Discussion:
This data appears to have a linear relationship — as x increases, so does y. The next step is to model this relationship using Linear Regression.

Part 2: Scikit-Learn

Scikit-learn is a popular Python library used for machine learning. It includes tools for training models like linear regression.

Step 1: Install scikit-learn

Run this only once in your environment.

pip install scikit-learn

Part 3: Training a Linear Regression Model

Step 1: Import Libraries

We are importing the LinearRegression model from Scikit Learn.

import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

Step 2: Define the Data

We are using the sama data as above.

x = np.array([2, 4, 6, 8, 10, 12, 14, 16])

y = np.array([1, 3, 5, 7, 9, 11, 13, 15])

Step 3: Train the Model

model = LinearRegression()

model.fit(x, y)

Part 4: Making Predictions

Use your model to predict a value for a new x.

x_new = np.array([[4]])

y_pred = model.predict(x_new)

print(f"Predicted y for x = {x_new} is: {y_pred[0]}")

Part 5 - Lab Exercises

Try a new data set

x = np.array([[1], [3], [5], [7], [9], [11], [13], [15]])

y = np.array([2, 5, 7, 10, 12, 14, 17, 20])

Predict Multiple Values
- Modify your program to predict 3 different x-values (e.g., x = 4, 10, 16) and plot all of them with a different marker.
- Which predictions seem most reliable?
- Are any outside the range of your data (extrapolation)?
Change the Data
- Edit the y values slightly to make the data less perfectly linear.
- E.g. y = np.array([2, 5, 6, 9, 13, 15, 16, 19])
- Re-run the model and prediction.
- Has the prediction changed much?
- Does the predicted point still feel "correct" based on the new trend?
Outliers
- Add an outlier to your data, like this:

x = np.append(x, [[100]]).reshape(-1, 1)

y = np.append(y, [250])

How does the predicted value change?
What does this tell you about the effect of extreme values on linear regression?

Page updated

Report abuse

Software AutomationScikit - Linear Regression

Objective

Part 1: Visualising the Data

Step 1: Import Libraries

Step 2: Define the Data

Step 3: Plot the Data

Part 2: Scikit-Learn

Step 1: Install scikit-learn

Part 3: Training a Linear Regression Model

Step 1: Import Libraries

Step 2: Define the Data

Step 3: Train the Model

Part 4: Making Predictions

Part 5 - Lab Exercises

Software Automation
Scikit - Linear Regression