Software Automation
Linear Regression

Syllabus Content

Describe types of algorithms associated with ML, including:
- linear regression

What is Linear Regression?

Linear regression is one of the simplest and most widely used techniques in machine learning. Think of it as finding the "line of best fit" through a set of data points.

In everyday terms: Linear regression helps us understand how one thing changes when another thing changes. For example, how house prices increase as the house size increases.
Example: Imagine you're trying to predict how many ice creams will sell based on the temperature. When it's hotter, more ice creams are sold. Linear regression helps us quantify that relationship.
Question: In your own words, what do you think linear regression might be used for in the real world? Try to think of at least two examples.

The Basic Idea

Linear regression works by drawing a straight line through data points in a way that the line is as close as possible to all points.

This line can be described with a simple equation:

y = mx + b

Where:

y is what we're trying to predict (the dependent variable)
x is our input data (the independent variable)
m is the slope of the line (how much y changes when x changes)
b is where the line crosses the y-axis (the y-intercept)

Example: A real estate agent in Sydney might use linear regression to estimate house prices. If we plot house size (x-axis) against price (y-axis), we might find that each additional square metre adds about $5,000 to the price.

Question: If the equation for house prices is Price = $5,000 × Size + $50,000, how much would a 100 square metre house cost? What about a 150 square metre house?

Lemonade Stand Example

Finding the Line of Best Fit

To find the line of best fit, we need to find values for m (slope) and b (y-intercept) that make our line as close as possible to all data points.

One way to measure "closeness" is to calculate the vertical distance from each point to the line, square these distances (to make them all positive), and then find the average. This is called the Mean Squared Error (MSE).

Our goal: Find the values of m and b that give us the smallest possible MSE.

Video Break (3 minutes): Let's watch this video that explains linear regression visually:

Question: After watching the video, can you explain in your own words what the "line of best fit" means?

Gradient Descent: Finding the Best Line

How do we find the perfect values for m and b? One common method is called Gradient Descent.

Think of it like finding the bottom of a valley:

Start at a random position (random values for m and b)
Look around to see which direction leads downhill (calculate the gradient)
Take a small step in that direction
Repeat until you reach the bottom of the valley (the minimum error)

Interactive Activity (10 minutes): Let's play with the Gradient Descent Game to see how this works!

Reflection Question: What happened when you changed the learning rate in the game? What about when you changed the starting point?

Multiple Linear Regression

So far, we've just looked at how one variable affects another. But what if multiple factors affect what we're trying to predict?

Multiple linear regression extends our equation to include more variables:

y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ

Where:

β₀ is the y-intercept
β₁, β₂, etc. are the coefficients for each input variable
x₁, x₂, etc. are the input variables

Example: For predicting house prices, we might use:

Price = β₀ + β₁ × Size + β₂ × Bedrooms + β₃ × Distance_to_CBD

Question: What factors might affect the price of a used car? List at least 3-4 factors and explain why they might matter.

Real-World Example: Predicting Fuel Consumption

Limitations of Linear Regression

Linear regression is powerful but has some limitations:

It assumes a linear relationship between variables
It's sensitive to outliers (extreme data points)
It doesn't work well for classification problems (yes/no outcomes)

Example: Linear regression works well for predicting house prices, but not for predicting whether a loan applicant will default (yes/no outcome).

Question: Can you think of a situation where the relationship between variables might not be linear? What would happen if you tried to use linear regression in that case?

Extension Task:

Complete the Google Machine Learning Crash Course linear regression module.

Page updated

Report abuse

Software AutomationLinear Regression