In previous tutorials, we explored how to implement simple and advanced linear regression from scratch using `NumPy` and gradient descent. Now, we'll take a step further and bridge the gap to neural networks by demonstrating how to use `PyTorch` for similar tasks. This tutorial will show how to leverage `PyTorch`'s capabilities to simplify and enhance the process of building and training models.
Before diving into this tutorial, it is highly recommended that you read the following tutorials:
How to Implement a Simple Linear Regression from Scratch in Python Using Gradient Descent
How to Implement an Advanced Linear Regression from Scratch in Python Using Gradient Descent
These tutorials provide foundational knowledge on linear regression and gradient descent, which are essential for understanding the concepts discussed here.
Additionally, ensure you have `NumPy` and `PyTorch` installed.
Linear regression is one of the simplest and most fundamental algorithms in machine learning. It models the relationship between a dependent variable `y` and one or more independent variables `x`. In this section, we'll recap the implementation of a simple linear regression model using `NumPy`.
Let's quickly recap the implementation of a simple linear regression using `NumPy`.
This code is also available in the Google Colab Notebook.
First, import the necessary library and initialize your data and parameters.
Python
import numpy as np
# Input and output data
x = np.array([1., 3., 10., 13., 7.])
y = np.array([2., 6., 20., 26., 14.])
# Initialize the parameter 'a' randomly
a = np.random.rand()
# Hyperparameters
learning_rate = 0.001
iterations = 100
N = x.size
`x` is the array of input features.
`y` is the array of target values.
`a` is the parameter (`weight`) that we want to learn.
`learning_rate` controls the size of the steps we take to reach the minimum of the `loss` function.
`iterations` is the number of times the gradient descent algorithm will run.
`N` is the number of data points.
We perform gradient descent to update our parameter `a`. Gradient descent is an optimization algorithm used to minimize the `loss` function. Here’s how you implement it:
Python
for i in range(iterations):
# Compute the predictions
y_pred = a * x
# Calculate the loss (Mean Squared Error)
loss = (1/N) * np.sum((y_pred - y) ** 2)
# Calculate the gradient
gradient = (2/N) * np.sum(x * (y_pred - y))
# Update the parameter 'a'
a = a - learning_rate * gradient
# Print the loss and parameter every 10 iterations
if (i+1) % 10 == 0:
print(f"Iteration {i+1}: Loss = {loss:.4f}, 'a' = {a:.4f}")
print(f"Optimal parameter 'a': {a:.4f}")
1) Predictions:
Compute the predicted values `y_pred` using the current parameter `a`.
Python
y_pred = a * x
Here, `y_pred` is calculated by multiplying the input `x` by the parameter `a`.
2) Loss Calculation:
Measure the Mean Squared Error (`MSE`) between the predicted values and the actual target values.
Python
loss = (1/N) * np.sum((y_pred - y) ** 2)
The `MSE` is computed by taking the average of the squared differences between the predicted values and the actual values.
3) Gradient Calculation:
Determine how to adjust the parameter `a` to reduce the `loss`.
Python
gradient = (2/N) * np.sum(x * (y_pred - y))
The gradient is the partial derivative of the `loss` function with respect to the parameter `a`. It indicates the direction and rate at which the parameter should be updated to minimize the `loss`.
4) Parameter Update:
Apply gradient descent to update the parameter `a`.
Python
a = a - learning_rate * gradient
The parameter `a` is updated by subtracting the product of the learning rate and the gradient from the current value of `a`.
5) Printing Progress:
Monitor training by printing the `loss` and parameter value every `10` iterations.
Python
if (i+1) % 10 == 0:
print(f"Iteration {i+1}: Loss = {loss:.4f}, 'a' = {a:.4f}")
6) Final Output:
After completing the iterations, print the optimal parameter value.
Python
print(f"Optimal parameter 'a': {a:.4f}")
In the previous section, we demonstrated how to implement linear regression from scratch using `NumPy`. Now, let's transition to `PyTorch`, which simplifies gradient calculations and model updates. `PyTorch` provides built-in functions and modules that streamline the process, making it more efficient and less error-prone.
This code is also available in the Google Colab Notebook.
First, define your data and initialize the parameter using `PyTorch`. This step is similar to what we did with `NumPy`, but with some key differences that take advantage of `PyTorch`'s capabilities.
Python
import torch
# Input and output data
x = torch.tensor([1., 3., 10., 13., 7.])
y = torch.tensor([2., 6., 20., 26., 14.])
# Initialize the parameter 'a' with gradient tracking
a = torch.randn(1, requires_grad=True)
# Hyperparameters
learning_rate = 0.001
iterations = 100
N = x.size(0)
1) Input and Output Data:
We use `torch.tensor` to create tensors from our input and output data arrays. This is the `PyTorch` equivalent of `NumPy` arrays but with additional capabilities for automatic differentiation.
Python
x = torch.tensor([1., 3., 10., 13., 7.])
y = torch.tensor([2., 6., 20., 26., 14.])
2) Initialize Parameter:
The parameter a is initialized with `torch.randn(1)` to a random value from a standard normal distribution. Setting `requires_grad=True` enables automatic differentiation.
Python
a = torch.randn(1, requires_grad=True)
3) Hyperparameters:
We define the learning rate, the number of iterations for the training loop, and the number of data points.
Python
learning_rate = 0.001
iterations = 100
N = x.size(0)
Next, we train the model using `PyTorch`'s automatic differentiation feature to update the parameter `a`. The training process involves computing predictions, calculating the `loss`, performing back propagation, and updating the parameter.
Python
for i in range(iterations):
# Compute the predictions
y_pred = a * x
# Calculate the loss (Mean Squared Error)
loss = (1/N) * torch.sum((y_pred - y) ** 2)
# Compute gradients
loss.backward()
# Update the parameter 'a' using gradient descent
with torch.no_grad():
a -= learning_rate * a.grad
a.grad.zero_() # Clear the gradients for the next iteration
# Print the loss and parameter value every 10 iterations
if (i+1) % 10 == 0:
print(f"Iteration {i+1}: Loss = {loss.item():.4f}, 'a' = {a.item():.4f}")
print(f"Optimal parameter 'a': {a.item():.4f}")
1) Predictions:
Compute the predicted values `y_pred` using the current parameter `a`. This is done by multiplying a by the input tensor `x`.
Python
y_pred = a * x
2) Loss Calculation:
Calculate the Mean Squared Error (`MSE`) between the predicted values and the actual target values.
Python
loss = (1/N) * torch.sum((y_pred - y) ** 2)
3) Backward Pass:
Perform back propagation to compute the `gradient` of the `loss` with respect to the parameter `a`. This is done using the `backward()` method on the `loss` tensor.
Python
loss.backward()
4) Parameter Update:
Update the parameter `a` using gradient descent. This involves subtracting the product of the learning rate and the gradient from the current value of `a`. We use `torch.no_grad()` to ensure the update is not tracked for gradient computation.
Python
with torch.no_grad():
a -= learning_rate * a.grad
a.grad.zero_() # Clear the gradients for the next iteration
5) Printing Progress:
Print the `loss` and the parameter `a` every `10` iterations to monitor the training progress.
Python
if (i+1) % 10 == 0:
print(f"Iteration {i+1}: Loss = {loss.item():.4f}, 'a' = {a.item():.4f}")
6) Final Output:
After completing the iterations, print the optimal parameter `a`.
Python
print(f"Optimal parameter 'a': {a.item():.4f}")
By transitioning from `NumPy` to `PyTorch`, we significantly simplify the process of implementing linear regression. `PyTorch`'s built-in functions for automatic differentiation and gradient tracking make the code more concise and easier to manage. This foundational understanding of `PyTorch` sets the stage for more complex neural network models, which we will explore in the next sections.
Now, we'll take advantage of `PyTorch`'s neural network (`nn`) module and optimization (`optim`) module to further simplify our implementation.
This code is also available in the Google Colab Notebook.
First, import the necessary modules, define your data, and initialize the model.
Python
import torch
import torch.nn as nn
import torch.optim as optim
# Input and output data
x = torch.tensor([1., 3., 10., 13., 7.]).view(-1, 1)
y = torch.tensor([2., 6., 20., 26., 14.]).view(-1, 1)
# Define the linear model
model = nn.Linear(in_features=1, out_features=1, bias=False)
# Initialize the model parameter
nn.init.uniform_(model.weight, a=0.0, b=1.0)
1) Import Modules:
Import the necessary `PyTorch` modules for building and training neural networks.
Python
import torch
import torch.nn as nn
import torch.optim as optim
2) Input and Output Data:
Define the input and output data using `torch.tensor` and reshape them to have one column.
Python
x = torch.tensor([1., 3., 10., 13., 7.]).view(-1, 1)
y = torch.tensor([2., 6., 20., 26., 14.]).view(-1, 1)
In the original dataset, `x` and `y` are one-dimensional shape tensors:
Python
x = torch.tensor([1., 3., 10., 13., 7.])
y = torch.tensor([2., 6., 20., 26., 14.])
These tensors have shapes (5,) and (5,) respectively, meaning they each contain 5 elements.
However, when working with neural network modules in `PyTorch`, it is often required to have the data in a two-dimensional shape, where the first dimension represents the batch size (number of samples) and the second dimension represents the number of features. This format is necessary because neural network modules expect input data to be in a batch format, even if the batch size is `1`.
In our case, each element in `x` represents a `single feature` (`input`), and each element in `y` represents the corresponding target value. Therefore, we need to reshape both tensors to have a second dimension, indicating that `each sample has one feature`. This is done using the view method:
Python
x = torch.tensor([1., 3., 10., 13., 7.]).view(-1, 1)
y = torch.tensor([2., 6., 20., 26., 14.]).view(-1, 1)
`view(-1, 1)` reshapes the tensor to have an unspecified number of rows (`-1` means infer the size from the remaining dimensions) and one column.
After reshaping, the tensors look like this:
Python
x = tensor([[ 1.],
[ 3.],
[10.],
[13.],
[ 7.]])
y = tensor([[ 2.],
[ 6.],
[20.],
[26.],
[14.]])
Each row now represents a `single sample` with `one feature`, and this format is compatible with the `nn.Linear` module in `PyTorch`.
Compatibility with Neural Network Layers: Neural network layers in `PyTorch`, such as `nn.Linear`, expect input data to be in the shape (`batch_size`, `num_features`). By reshaping the data, we ensure it meets this requirement.
Batch Processing: Neural networks are designed to process batches of data simultaneously, which can greatly improve training efficiency. Reshaping the data to include the batch dimension allows `PyTorch` to handle multiple samples in one forward and backward pass.
Consistent Data Format: Reshaping to two-dimensional tensors makes it easier to extend the model to handle multiple features per sample. For instance, if `x` had multiple features, we would have reshaped it to (`batch_size`, `num_features`) regardless.
In summary, reshaping the tensors ensures they are in the correct format for `PyTorch`'s neural network modules, allowing us to leverage batch processing and maintain consistency in data handling.
3) Define Model:
Use `nn.Linear` to define a linear model with one input feature and one output feature. We set `bias=False` to exclude the bias term for simplicity.
Python
model = nn.Linear(in_features=1, out_features=1, bias=False)
Why Set `bias=False`?
Consistency with Previous Implementation: In the previous sections where we implemented linear regression using both `NumPy` and `PyTorch` without neural network modules, we did not include a `bias` term. To maintain consistency and make a direct comparison, we set `bias=False` in the `nn.Linear` module.
Simplified Model: Setting `bias=False` simplifies the model to a basic form of linear regression, where the prediction is a simple multiplication of the input `x` by the parameter `a` `weight` :
Python
y_pred = a*x
Without a bias term, the model focuses solely on learning the scaling factor (a) that maps the input x to the output y.
Focus on Core Concept: By excluding the `bias` term, the tutorial emphasizes the core concept of linear regression and gradient descent. Adding a `bias` term would introduce another parameter, which, while important in general, is not necessary for understanding the fundamental training process demonstrated here.
Understanding the Bias Term
The bias term in a linear model adds an additional degree of freedom, allowing the model to better fit data that does not pass through the origin. The general form of a linear regression model with a bias term is:
Python
y_pred = a*x + b
Where:
`a` is the weight (slope).
`b` is the bias (intercept).
In many practical applications, including a bias term is crucial because it allows the model to fit a wider variety of data patterns. However, for the purposes of this tutorial, we exclude the bias to keep the model simple and focus on the fundamental aspects of training a linear regression model using `PyTorch`.
How to Include Bias if Needed
If you want to include the `bias` term, you can simply set `bias=True` (which is the default value) when defining the model:
Python
model = nn.Linear(in_features=1, out_features=1, bias=True)
This will add an additional parameter to the model, which `PyTorch` will initialize and optimize along with the `weight`. The training process remains the same, but now the model can fit data that does not pass through the origin.
Setting `bias=False` simplifies the model to align with the previous examples in the tutorial, focusing on the core learning process without the additional complexity of an `intercept` term. This choice makes it easier to understand the basics of gradient descent and parameter optimization using `PyTorch`.
4) Initialize Parameters:
Initialize the model's `weight` parameter uniformly between `0` and `1` using `nn.init.uniform_`.
Python
nn.init.uniform_(model.weight, a=0.0, b=1.0)
Next, define the `loss` function and the `optimizer`.
Python
# Define the loss function
loss_fn = nn.MSELoss(reduction='mean')
# Define the optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001)
We use Mean Squared Error (`MSE`) as our `loss` function and Stochastic Gradient Descent (`SGD`) as our `optimizer`.
Finally, we train the model using the `optimizer` and `loss` function.
Python
iterations = 100
for i in range(iterations):
# Zero the gradients
optimizer.zero_grad()
# Compute the predictions
y_pred = model(x)
# Calculate the loss
loss = loss_fn(y_pred, y)
# Compute gradients
loss.backward()
# Update the parameters
optimizer.step()
# Print the loss every 10 iterations
if (i+1) % 10 == 0:
print(f"Iteration {i+1}: Loss = {loss.item():.4f}")
print(f"Optimal parameter 'a': {model.weight.item():.4f}")
Zero Gradients: Clear the gradients of the model parameters.
Predictions: Compute `y_pred` using the current model parameters.
Loss Calculation: Measure the Mean Squared Error (`MSE`) between `y_pred` and `y`.
Backward Pass: Compute the `gradients` of the `loss` with respect to the model parameters.
Parameter Update: Update the model parameters using the `optimizer`.
Printing Progress: Monitor training by printing the `loss` every 10 iterations.
By leveraging `PyTorch`'s neural network modules, we can significantly simplify the process of defining and training models. The `nn` module provides a high-level API for building neural networks, while the `optim` module offers various optimization algorithms to train the models efficiently. This approach not only reduces the amount of code but also makes it more readable and maintainable.
In this tutorial, we demonstrated how to transition from implementing a simple linear regression model using `NumPy` to using `PyTorch`. By leveraging `PyTorch`, we simplified gradient calculations and parameter updates, making the process of building and training models more efficient.
We then showed how to use `PyTorch`'s neural network modules and optimization algorithms to further simplify and enhance our linear regression implementation. This tutorial serves as a bridge to more complex neural networks, paving the way for you to explore deeper and more powerful models with `PyTorch`.