From Linear Regression to Neural Networks: Transitioning from NumPy to PyTorch

Introduction

In previous tutorials, we explored how to implement simple and advanced linear regression from scratch using `NumPy` and gradient descent. Now, we'll take a step further and bridge the gap to neural networks by demonstrating how to use `PyTorch` for similar tasks. This tutorial will show how to leverage `PyTorch`'s capabilities to simplify and enhance the process of building and training models.

Prerequisites

Before diving into this tutorial, it is highly recommended that you read the following tutorials:

These tutorials provide foundational knowledge on linear regression and gradient descent, which are essential for understanding the concepts discussed here.

Additionally, ensure you have `NumPy` and `PyTorch` installed.

Recap: Linear Regression with `NumPy`

Linear regression is one of the simplest and most fundamental algorithms in machine learning. It models the relationship between a dependent variable `y` and one or more independent variables `x`. In this section, we'll recap the implementation of a simple linear regression model using `NumPy`.

Let's quickly recap the implementation of a simple linear regression using `NumPy`.

This code is also available in the Google Colab Notebook.

Step 1: Initialize Data and Parameters

First, import the necessary library and initialize your data and parameters.

Python

import numpy as np

# Input and output data

x = np.array([1., 3., 10., 13., 7.])

y = np.array([2., 6., 20., 26., 14.])

# Initialize the parameter 'a' randomly

a = np.random.rand()

# Hyperparameters

learning_rate = 0.001

iterations = 100

N = x.size

Here:

`x` is the array of input features.
`y` is the array of target values.
`a` is the parameter (`weight`) that we want to learn.
`learning_rate` controls the size of the steps we take to reach the minimum of the `loss` function.
`iterations` is the number of times the gradient descent algorithm will run.
`N` is the number of data points.

Step 2: Training the Model

We perform gradient descent to update our parameter `a`. Gradient descent is an optimization algorithm used to minimize the `loss` function. Here’s how you implement it:

Python

for i in range(iterations):

# Compute the predictions

y_pred = a * x

# Calculate the loss (Mean Squared Error)

loss = (1/N) * np.sum((y_pred - y) ** 2)

# Calculate the gradient

gradient = (2/N) * np.sum(x * (y_pred - y))

# Update the parameter 'a'

a = a - learning_rate * gradient

# Print the loss and parameter every 10 iterations

if (i+1) % 10 == 0:

print(f"Iteration {i+1}: Loss = {loss:.4f}, 'a' = {a:.4f}")

print(f"Optimal parameter 'a': {a:.4f}")

Explanation:

1) Predictions:

Compute the predicted values `y_pred` using the current parameter `a`.

Python

y_pred = a * x

Here, `y_pred` is calculated by multiplying the input `x` by the parameter `a`.

2) Loss Calculation:

Measure the Mean Squared Error (`MSE`) between the predicted values and the actual target values.

Python

loss = (1/N) * np.sum((y_pred - y) ** 2)

The `MSE` is computed by taking the average of the squared differences between the predicted values and the actual values.

3) Gradient Calculation:

Determine how to adjust the parameter `a` to reduce the `loss`.

Python

gradient = (2/N) * np.sum(x * (y_pred - y))

The gradient is the partial derivative of the `loss` function with respect to the parameter `a`. It indicates the direction and rate at which the parameter should be updated to minimize the `loss`.

4) Parameter Update:

Apply gradient descent to update the parameter `a`.

Python

a = a - learning_rate * gradient

The parameter `a` is updated by subtracting the product of the learning rate and the gradient from the current value of `a`.

5) Printing Progress:

Monitor training by printing the `loss` and parameter value every `10` iterations.

Python

if (i+1) % 10 == 0:

print(f"Iteration {i+1}: Loss = {loss:.4f}, 'a' = {a:.4f}")

6) Final Output:

After completing the iterations, print the optimal parameter value.

Python

print(f"Optimal parameter 'a': {a:.4f}")

Transition to `PyTorch`

In the previous section, we demonstrated how to implement linear regression from scratch using `NumPy`. Now, let's transition to `PyTorch`, which simplifies gradient calculations and model updates. `PyTorch` provides built-in functions and modules that streamline the process, making it more efficient and less error-prone.

This code is also available in the Google Colab Notebook.

Step 1: Initialize Data and Parameters

First, define your data and initialize the parameter using `PyTorch`. This step is similar to what we did with `NumPy`, but with some key differences that take advantage of `PyTorch`'s capabilities.

Python

import torch

# Input and output data

x = torch.tensor([1., 3., 10., 13., 7.])

y = torch.tensor([2., 6., 20., 26., 14.])

# Initialize the parameter 'a' with gradient tracking

a = torch.randn(1, requires_grad=True)

# Hyperparameters

learning_rate = 0.001

iterations = 100

N = x.size(0)

Explanation:

1) Input and Output Data:

We use `torch.tensor` to create tensors from our input and output data arrays. This is the `PyTorch` equivalent of `NumPy` arrays but with additional capabilities for automatic differentiation.

Python

x = torch.tensor([1., 3., 10., 13., 7.])

y = torch.tensor([2., 6., 20., 26., 14.])

2) Initialize Parameter:

The parameter a is initialized with `torch.randn(1)` to a random value from a standard normal distribution. Setting `requires_grad=True` enables automatic differentiation.

Python

a = torch.randn(1, requires_grad=True)

3) Hyperparameters:

We define the learning rate, the number of iterations for the training loop, and the number of data points.

Python

learning_rate = 0.001

iterations = 100

N = x.size(0)

Step 2: Training the Model

Next, we train the model using `PyTorch`'s automatic differentiation feature to update the parameter `a`. The training process involves computing predictions, calculating the `loss`, performing back propagation, and updating the parameter.

Python

for i in range(iterations):

# Compute the predictions

y_pred = a * x

# Calculate the loss (Mean Squared Error)

loss = (1/N) * torch.sum((y_pred - y) ** 2)

# Compute gradients

loss.backward()

# Update the parameter 'a' using gradient descent

with torch.no_grad():

a -= learning_rate * a.grad

a.grad.zero_() # Clear the gradients for the next iteration

# Print the loss and parameter value every 10 iterations

if (i+1) % 10 == 0:

print(f"Iteration {i+1}: Loss = {loss.item():.4f}, 'a' = {a.item():.4f}")

print(f"Optimal parameter 'a': {a.item():.4f}")

Explanation:

1) Predictions:

Compute the predicted values `y_pred` using the current parameter `a`. This is done by multiplying a by the input tensor `x`.

Python

y_pred = a * x

2) Loss Calculation:

Calculate the Mean Squared Error (`MSE`) between the predicted values and the actual target values.

Python

loss = (1/N) * torch.sum((y_pred - y) ** 2)

3) Backward Pass:

Perform back propagation to compute the `gradient` of the `loss` with respect to the parameter `a`. This is done using the `backward()` method on the `loss` tensor.

Python

loss.backward()

4) Parameter Update:

Update the parameter `a` using gradient descent. This involves subtracting the product of the learning rate and the gradient from the current value of `a`. We use `torch.no_grad()` to ensure the update is not tracked for gradient computation.

Python

with torch.no_grad():

a -= learning_rate * a.grad

a.grad.zero_() # Clear the gradients for the next iteration

5) Printing Progress:

Print the `loss` and the parameter `a` every `10` iterations to monitor the training progress.

Python

if (i+1) % 10 == 0:

print(f"Iteration {i+1}: Loss = {loss.item():.4f}, 'a' = {a.item():.4f}")

6) Final Output:

After completing the iterations, print the optimal parameter `a`.

Python

print(f"Optimal parameter 'a': {a.item():.4f}")

By transitioning from `NumPy` to `PyTorch`, we significantly simplify the process of implementing linear regression. `PyTorch`'s built-in functions for automatic differentiation and gradient tracking make the code more concise and easier to manage. This foundational understanding of `PyTorch` sets the stage for more complex neural network models, which we will explore in the next sections.

Leveraging PyTorch's Neural Network Modules

Now, we'll take advantage of `PyTorch`'s neural network (`nn`) module and optimization (`optim`) module to further simplify our implementation.

This code is also available in the Google Colab Notebook.

Step 1: Initialize Data and Model

First, import the necessary modules, define your data, and initialize the model.

Python

import torch

import torch.nn as nn

import torch.optim as optim

# Input and output data

x = torch.tensor([1., 3., 10., 13., 7.]).view(-1, 1)

y = torch.tensor([2., 6., 20., 26., 14.]).view(-1, 1)

# Define the linear model

model = nn.Linear(in_features=1, out_features=1, bias=False)

# Initialize the model parameter

nn.init.uniform_(model.weight, a=0.0, b=1.0)

Explanation:

1) Import Modules:

Import the necessary `PyTorch` modules for building and training neural networks.

Python

import torch

import torch.nn as nn

import torch.optim as optim

2) Input and Output Data:

Define the input and output data using `torch.tensor` and reshape them to have one column.

Python

x = torch.tensor([1., 3., 10., 13., 7.]).view(-1, 1)

y = torch.tensor([2., 6., 20., 26., 14.]).view(-1, 1)

Why Reshape the Tensors?

In the original dataset, `x` and `y` are one-dimensional shape tensors:

Python

x = torch.tensor([1., 3., 10., 13., 7.])

y = torch.tensor([2., 6., 20., 26., 14.])

These tensors have shapes (5,) and (5,) respectively, meaning they each contain 5 elements.

However, when working with neural network modules in `PyTorch`, it is often required to have the data in a two-dimensional shape, where the first dimension represents the batch size (number of samples) and the second dimension represents the number of features. This format is necessary because neural network modules expect input data to be in a batch format, even if the batch size is `1`.

In our case, each element in `x` represents a `single feature` (`input`), and each element in `y` represents the corresponding target value. Therefore, we need to reshape both tensors to have a second dimension, indicating that `each sample has one feature`. This is done using the view method:

Python

x = torch.tensor([1., 3., 10., 13., 7.]).view(-1, 1)

y = torch.tensor([2., 6., 20., 26., 14.]).view(-1, 1)

`view(-1, 1)` reshapes the tensor to have an unspecified number of rows (`-1` means infer the size from the remaining dimensions) and one column.

After reshaping, the tensors look like this:

Python

x = tensor([[ 1.],

[ 3.],

[10.],

[13.],

[ 7.]])

y = tensor([[ 2.],

[ 6.],

[20.],

[26.],

[14.]])

Each row now represents a `single sample` with `one feature`, and this format is compatible with the `nn.Linear` module in `PyTorch`.

Why This Reshaping is Important?

Compatibility with Neural Network Layers: Neural network layers in `PyTorch`, such as `nn.Linear`, expect input data to be in the shape (`batch_size`, `num_features`). By reshaping the data, we ensure it meets this requirement.
Batch Processing: Neural networks are designed to process batches of data simultaneously, which can greatly improve training efficiency. Reshaping the data to include the batch dimension allows `PyTorch` to handle multiple samples in one forward and backward pass.
Consistent Data Format: Reshaping to two-dimensional tensors makes it easier to extend the model to handle multiple features per sample. For instance, if `x` had multiple features, we would have reshaped it to (`batch_size`, `num_features`) regardless.

In summary, reshaping the tensors ensures they are in the correct format for `PyTorch`'s neural network modules, allowing us to leverage batch processing and maintain consistency in data handling.

3) Define Model:

Use `nn.Linear` to define a linear model with one input feature and one output feature. We set `bias=False` to exclude the bias term for simplicity.

Python

model = nn.Linear(in_features=1, out_features=1, bias=False)

Why Set `bias=False`?

Consistency with Previous Implementation: In the previous sections where we implemented linear regression using both `NumPy` and `PyTorch` without neural network modules, we did not include a `bias` term. To maintain consistency and make a direct comparison, we set `bias=False` in the `nn.Linear` module.

Simplified Model: Setting `bias=False` simplifies the model to a basic form of linear regression, where the prediction is a simple multiplication of the input `x` by the parameter `a` `weight` :

Python

y_pred = a*x

Without a bias term, the model focuses solely on learning the scaling factor (a) that maps the input x to the output y.

Focus on Core Concept: By excluding the `bias` term, the tutorial emphasizes the core concept of linear regression and gradient descent. Adding a `bias` term would introduce another parameter, which, while important in general, is not necessary for understanding the fundamental training process demonstrated here.

Understanding the Bias Term

The bias term in a linear model adds an additional degree of freedom, allowing the model to better fit data that does not pass through the origin. The general form of a linear regression model with a bias term is:

Python

y_pred = a*x + b

Where:

`a` is the weight (slope).
`b` is the bias (intercept).

In many practical applications, including a bias term is crucial because it allows the model to fit a wider variety of data patterns. However, for the purposes of this tutorial, we exclude the bias to keep the model simple and focus on the fundamental aspects of training a linear regression model using `PyTorch`.

How to Include Bias if Needed

If you want to include the `bias` term, you can simply set `bias=True` (which is the default value) when defining the model:

Python

model = nn.Linear(in_features=1, out_features=1, bias=True)

This will add an additional parameter to the model, which `PyTorch` will initialize and optimize along with the `weight`. The training process remains the same, but now the model can fit data that does not pass through the origin.

Setting `bias=False` simplifies the model to align with the previous examples in the tutorial, focusing on the core learning process without the additional complexity of an `intercept` term. This choice makes it easier to understand the basics of gradient descent and parameter optimization using `PyTorch`.

4) Initialize Parameters:

Initialize the model's `weight` parameter uniformly between `0` and `1` using `nn.init.uniform_`.

Python

nn.init.uniform_(model.weight, a=0.0, b=1.0)

Step 2: Define Loss Function and Optimizer

Next, define the `loss` function and the `optimizer`.

Python

# Define the loss function

loss_fn = nn.MSELoss(reduction='mean')

# Define the optimizer

optimizer = optim.SGD(model.parameters(), lr=0.001)

We use Mean Squared Error (`MSE`) as our `loss` function and Stochastic Gradient Descent (`SGD`) as our `optimizer`.

Step 3: Training the Model

Finally, we train the model using the `optimizer` and `loss` function.

Python

iterations = 100

for i in range(iterations):

# Zero the gradients

optimizer.zero_grad()

# Compute the predictions

y_pred = model(x)

# Calculate the loss

loss = loss_fn(y_pred, y)

# Compute gradients

loss.backward()

# Update the parameters

optimizer.step()

# Print the loss every 10 iterations

if (i+1) % 10 == 0:

print(f"Iteration {i+1}: Loss = {loss.item():.4f}")

print(f"Optimal parameter 'a': {model.weight.item():.4f}")

Explanation:

Zero Gradients: Clear the gradients of the model parameters.
Predictions: Compute `y_pred` using the current model parameters.
Loss Calculation: Measure the Mean Squared Error (`MSE`) between `y_pred` and `y`.
Backward Pass: Compute the `gradients` of the `loss` with respect to the model parameters.
Parameter Update: Update the model parameters using the `optimizer`.
Printing Progress: Monitor training by printing the `loss` every 10 iterations.

By leveraging `PyTorch`'s neural network modules, we can significantly simplify the process of defining and training models. The `nn` module provides a high-level API for building neural networks, while the `optim` module offers various optimization algorithms to train the models efficiently. This approach not only reduces the amount of code but also makes it more readable and maintainable.

Conclusion

In this tutorial, we demonstrated how to transition from implementing a simple linear regression model using `NumPy` to using `PyTorch`. By leveraging `PyTorch`, we simplified gradient calculations and parameter updates, making the process of building and training models more efficient.

We then showed how to use `PyTorch`'s neural network modules and optimization algorithms to further simplify and enhance our linear regression implementation. This tutorial serves as a bridge to more complex neural networks, paving the way for you to explore deeper and more powerful models with `PyTorch`.

Page updated

Google Sites

Report abuse

From Linear Regression to Neural Networks: Transitioning from NumPy to PyTorch

Introduction

Prerequisites

Recap: Linear Regression with `NumPy`

Step 1: Initialize Data and Parameters

Here:

Step 2: Training the Model

Explanation:

Transition to `PyTorch`

Step 1: Initialize Data and Parameters

Explanation:

Step 2: Training the Model

Explanation:

Leveraging PyTorch's Neural Network Modules

Step 1: Initialize Data and Model

Explanation:

Why Reshape the Tensors?

Why This Reshaping is Important?

Step 2: Define Loss Function and Optimizer

Step 3: Training the Model

Explanation:

Conclusion

BENCHABANE Mohamed Lamine