PyTorch in 5 Mins

Background

This is a quick tutorial created by prompting ChatGPT in 2023 with some relatively small edits from me. It is a pretty good for those who know python and would like to start with PyTorch. My systematic tutorial, please refer to the official PyTorch tutorial: https://pytorch.org/tutorials/

Getting started

PyTorch is a deep learning library that has gained immense popularity, especially for research and dynamic computation. Given your background in Python, NumPy, SciPy, Scikit-Learn, and pandas, you have a solid foundation to dive into PyTorch. This brief tutorial will introduce you to the basics of PyTorch and showcase how it can be used for various tasks.

1. Installation

To start, you'd first need to install PyTorch. Depending on your system and needs, installation instructions can vary, but generally:

Code example

pip install torch torchvision

2. PyTorch Basics

Tensors

At its core, PyTorch operates on tensors, which are multi-dimensional arrays similar to NumPy arrays.

Code example

import torch

# Create a 1D tensor

a = torch.tensor([1, 2, 3])

print(a)

# Create a 2x3 tensor (matrix) filled with ones

b = torch.ones(2, 3)

print(b)

NumPy and Tensor Interoperability

Given your familiarity with NumPy, you'd be pleased to know that PyTorch has excellent interoperability with NumPy:

Code example

import numpy as np

# Convert a NumPy array to a PyTorch tensor

numpy_arr = np.array([1, 2, 3])

tensor_from_numpy = torch.from_numpy(numpy_arr)

print(tensor_from_numpy)

# Convert a PyTorch tensor to a NumPy array

tensor = torch.tensor([1, 2, 3])

numpy_from_tensor = tensor.numpy()

print(numpy_from_tensor)

3. Automatic Differentiation

One of the standout features of PyTorch is its ability to compute gradients automatically. This is useful for optimizing models using gradient descent.

Code example

# Create a tensor and set requires_grad=True to track computation

x = torch.tensor(2.0, requires_grad=True)

# Define a simple operation

y = x ** 2

# Compute gradients

y.backward()

# Print the gradient

print(x.grad) # Outputs 4.0 (dy/dx = 2*x at x=2)

4. Neural Network Basics

PyTorch's torch.nn module contains pre-defined layers, loss functions, and optimization algorithms:

Code example

import torch.nn as nn

import torch.optim as optim

# Define a simple feed-forward neural network

class SimpleNN(nn.Module):

def __init__(self):

super(SimpleNN, self).__init__()

self.fc = nn.Linear(1, 1) # 1 input neuron, 1 output neuron

def forward(self, x):

return self.fc(x)

# Create the model, loss function, and optimizer

model = SimpleNN()

loss_fn = nn.MSELoss()

optimizer = optim.SGD(model.parameters(), lr=0.01)

# Dummy data

inputs = torch.tensor([[1.0], [2.0], [3.0]])

outputs = torch.tensor([[2.0], [4.0], [6.0]])

# Training loop

for epoch in range(100):

# Forward pass

predictions = model(inputs)

loss = loss_fn(predictions, outputs)

# Backward pass and optimization

optimizer.zero_grad()

loss.backward()

optimizer.step()

if (epoch+1) % 10 == 0:

print(f"Epoch [{epoch+1}/100], Loss: {loss.item():.4f}")

The code above trains a simple neural network to approximate the function y = 2x. Given your background, note that the process involves forward propagation (predicting outputs), computing the loss, and back-propagating the error to adjust the weights.

5. Tensors

As mentioned earlier, tensors in PyTorch are multi-dimensional arrays. These are similar to NumPy arrays, but with the added capability to run on GPUs and other hardware accelerators.

Tensor Operations

Just like NumPy, PyTorch offers a wide variety of operations for tensor manipulation:

Reshaping:
python
Copy code
a = torch.randn(2, 3) # Creates a 2x3 tensor with random values

print(a)

b = a.view(3, 2) # Reshapes the tensor to 3x2

print(b)

Element-wise Operations:
python
Copy code
x = torch.tensor([1, 2, 3])

y = torch.tensor([4, 5, 6])

z = x + y

print(z)

Matrix Multiplication:
python
Copy code
x = torch.randn(2, 3)

y = torch.randn(3, 2)

z = torch.mm(x, y)

print(z)

Reductions:
python
Copy code
x = torch.tensor([[1, 2], [3, 4]])

sum_x = torch.sum(x)

mean_x = torch.mean(x)

print(sum_x, mean_x)

And many more such operations are available. You'll find that if you're familiar with NumPy, many of these operations will feel very familiar, just with a slightly different syntax.

6. Autograd

The autograd system in PyTorch provides automatic differentiation for all operations on tensors. It's what allows us to train neural networks without manually computing gradients.

How it works:

When you create a tensor with requires_grad=True, PyTorch starts to track all operations on it.
When you finish your computation, you call .backward(), and PyTorch will compute gradients automatically and store them in the .grad attribute of the tensor.
To prevent tracking, you can call .detach() on a tensor or wrap the computation inside with torch.no_grad():.

Examples:

Simple Gradient:

Code example

x = torch.tensor(2.0, requires_grad=True)

y = x ** 2

y.backward()

print(x.grad) # Outputs 4.0 (dy/dx = 2*x at x=2)

More complex computations:
Code example

x = torch.tensor(2.0, requires_grad=True)

y = torch.tensor(3.0, requires_grad=True)

z = x * y + y ** 2

z.backward()

print(x.grad) # Outputs 3.0 (dz/dx = y at y=3)

print(y.grad) # Outputs 8.0 (dz/dy = x + 2*y at x=2, y=3)

Stopping autograd from tracking:
Code example

x = torch.tensor(2.0, requires_grad=True)

with torch.no_grad():

y = x ** 2

print(y.requires_grad) # Outputs False

7. Chain Rule and Computation Graph

One of the core principles behind autograd is the chain rule of differentiation. When you do a series of operations and call .backward(), PyTorch will use the chain rule to compute the gradients.

Each operation creates a node in a computation graph, and these nodes have references to the inputs and outputs of the operation. When you call .backward(), PyTorch traverses this graph in reverse order (from the loss, typically, to the input data) to compute gradients.

Understanding this can be especially helpful when debugging or when trying to create custom operations or layers.

In conclusion, PyTorch's tensor operations combined with its autograd system offer a flexible and powerful way to build and train complex models, without much of the manual heavy lifting.

You said:

can you show me an example of a more complex case that show the chain rule for the autograd?

ChatGPT said:

ChatGPT

Certainly! Let's walk through a more intricate example step by step to illustrate the chain rule in action with PyTorch's autograd.

Problem:

Given the functions:

f(u,v)=u2+3uv+5v2f(u, v) = u^2 + 3uv + 5v^2f(u,v)=u2+3uv+5v2
u(x,y)=xy+sin(x)u(x, y) = xy + sin(x)u(x,y)=xy+sin(x)
v(x,y)=y2−cos(y)v(x, y) = y^2 - cos(y)v(x,y)=y2−cos(y)

We want to compute the gradient of fff with respect to xxx and yyy, i.e., ∂f∂x\frac{\partial f}{\partial x}∂x∂f and ∂f∂y\frac{\partial f}{\partial y}∂y∂f when x=πx = \pix=π and y=1y = 1y=1.

Solution:

Using the chain rule, we can express the derivatives as:

∂f∂x=∂f∂u×∂u∂x+∂f∂v×∂v∂x\frac{\partial f}{\partial x} = \frac{\partial f}{\partial u} \times \frac{\partial u}{\partial x} + \frac{\partial f}{\partial v} \times \frac{\partial v}{\partial x}∂x∂f=∂u∂f×∂x∂u+∂v∂f×∂x∂v
∂f∂y=∂f∂u×∂u∂y+∂f∂v×∂v∂y\frac{\partial f}{\partial y} = \frac{\partial f}{\partial u} \times \frac{\partial u}{\partial y} + \frac{\partial f}{\partial v} \times \frac{\partial v}{\partial y}∂y∂f=∂u∂f×∂y∂u+∂v∂f×∂y∂v

Now, let's calculate this using PyTorch:

Code example

import torch

from math import pi

# Define x and y as tensors, with requires_grad set to True to track computations

x = torch.tensor(pi, requires_grad=True)

y = torch.tensor(1.0, requires_grad=True)

# Define u and v in terms of x and y

u = x * y + torch.sin(x)

v = y**2 - torch.cos(y)

# Define f in terms of u and v

f = u**2 + 3*u*v + 5*v**2

# Compute the gradients

f.backward()

# Results

print(f"df/dx: {x.grad.item()}")

print(f"df/dy: {y.grad.item()}")

When you run this, PyTorch uses its autograd system to automatically compute the derivatives based on the computation graph. Behind the scenes, it's applying the chain rule as we've formulated.

Note that you don't have to manually differentiate any of the functions or apply the chain rule yourself. PyTorch takes care of all the differentiation work with the .backward() call. The gradients (or derivatives) for each tensor with requires_grad=True are stored in the .grad attribute of that tensor.

GPU

A big advantage of using PyTorch is it eases the forth and back with the GPU. Here are some basic steps

1. Checking for GPU Availability

First, let's check if a GPU is available on your system.

import torch

# Check if a GPU is available

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print(device)

If a GPU is available, this will print cuda, otherwise, it will print cpu, which means operations will run on the CPU.

2. Operating on Tensors with GPU

To use the GPU, you simply need to transfer the tensors to the GPU using .to(device) or .cuda():

# Create a tensor and transfer it to the GPU

x = torch.randn(3, 3)

x = x.to(device) # Or alternatively: x = x.cuda()

print(x)

# Perform operations on the GPU

y = torch.ones(3, 3, device=device)

z = x + y

print(z)

# Transfer result back to CPU if needed

z_cpu = z.to("cpu")

print(z_cpu)

The key here is that once a tensor is on the GPU, all subsequent operations on that tensor will also be executed on the GPU. You can move the tensors back to the CPU using .to('cpu').

3. Running a Neural Network Model on the GPU

To run a neural network on the GPU, you need to transfer both the model and the data to the GPU.

Example: Simple Neural Network on GPU

python

Copy code

import torch

import torch.nn as nn

import torch.optim as optim

# Define a simple feed-forward neural network

class SimpleNN(nn.Module):

def __init__(self):

super(SimpleNN, self).__init__()

self.fc = nn.Linear(1, 1) # 1 input neuron, 1 output neuron

def forward(self, x):

return self.fc(x)

# Check device (GPU or CPU)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Instantiate the model and move it to the GPU

model = SimpleNN().to(device)

# Create dummy data and transfer it to the GPU

inputs = torch.tensor([[1.0], [2.0], [3.0]], device=device)

outputs = torch.tensor([[2.0], [4.0], [6.0]], device=device)

# Define a loss function and an optimizer

loss_fn = nn.MSELoss()

optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop (on GPU)

for epoch in range(100):

optimizer.zero_grad()

# Forward pass

predictions = model(inputs)

# Compute loss

loss = loss_fn(predictions, outputs)

# Backward pass and optimization

loss.backward()

optimizer.step()

if (epoch+1) % 10 == 0:

print(f"Epoch [{epoch+1}/100], Loss: {loss.item():.4f}")

Key Steps in GPU Utilization:

Move the model to GPU: You move the model by calling model.to(device), where device is "cuda" if a GPU is available.
Move data to GPU: You move the data (input tensors) to the GPU by calling .to(device) on the tensors.
Ensure computations happen on the GPU: Once the model and data are on the GPU, all operations (forward pass, loss computation, backward pass) will run on the GPU.
Move results back to the CPU (optional): If you need to examine the results on the CPU or perform further operations that require CPU, you can move tensors back using .to("cpu").

By following these steps, you can easily accelerate your PyTorch models using the power of GPUs. After training a model on the GPU, you might want to move the trained model and tensors (such as outputs, parameters, or any intermediate results) back to the CPU, especially if you need to use them in a CPU environment (e.g., for saving, visualization, or further processing).

Here’s how you can do that:

1. Moving the Model from GPU to CPU

Once the training is done, you can move the entire model back to the CPU using .to("cpu"). This ensures that all model parameters are now on the CPU.

python

Copy code

# Move the trained model back to the CPU

model = model.to("cpu")

Now the model is on the CPU and can be used for further operations without requiring a GPU.

2. Moving Tensors from GPU to CPU

Similarly, after training, you might want to move certain tensors (e.g., outputs or model parameters) from the GPU to the CPU:

python

Copy code

# Example tensor on GPU (after some training or processing)

output = torch.tensor([1.0, 2.0, 3.0], device="cuda")

# Move tensor back to CPU

output_cpu = output.to("cpu")

# Now the tensor is on the CPU

print(output_cpu)

Example: Complete Workflow (Training on GPU, then Moving Model and Tensors to CPU)

Let’s add this to the neural network training example we discussed earlier:

python

Copy code

import torch

import torch.nn as nn

import torch.optim as optim

# Define a simple feed-forward neural network

class SimpleNN(nn.Module):

def __init__(self):

super(SimpleNN, self).__init__()

self.fc = nn.Linear(1, 1) # 1 input neuron, 1 output neuron

def forward(self, x):

return self.fc(x)

# Check device (GPU or CPU)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Instantiate the model and move it to the GPU

model = SimpleNN().to(device)

# Create dummy data and transfer it to the GPU

inputs = torch.tensor([[1.0], [2.0], [3.0]], device=device)

outputs = torch.tensor([[2.0], [4.0], [6.0]], device=device)

# Define a loss function and an optimizer

loss_fn = nn.MSELoss()

optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop (on GPU)

for epoch in range(100):

optimizer.zero_grad()

# Forward pass

predictions = model(inputs)

# Compute loss

loss = loss_fn(predictions, outputs)

# Backward pass and optimization

loss.backward()

optimizer.step()

if (epoch+1) % 10 == 0:

print(f"Epoch [{epoch+1}/100], Loss: {loss.item():.4f}")

# After training, move the trained model to CPU

model = model.to("cpu")

# If you want to move the final predictions (or any tensor) back to CPU

predictions_cpu = predictions.to("cpu")

# Now, you can print or save the model's predictions from the CPU

print(predictions_cpu)

Saving the Model

You might also want to save the model's state after training on the GPU and moving it to the CPU:

python

Copy code

# Save the model's state dictionary (after moving to CPU)

torch.save(model.state_dict(), "model.pth")

# Load the model state dict back

model_loaded = SimpleNN()

model_loaded.load_state_dict(torch.load("model.pth"))

model_loaded.eval() # Switch to evaluation mode

Summary

Moving the Model to CPU: Use model.to("cpu").
Moving Tensors to CPU: Use .to("cpu") on tensors (e.g., outputs, intermediate results).
Saving the Model: Use torch.save() to save the model’s state dict after moving it to CPU.
Loading the Model: Use torch.load() to load the model state dict.

This allows you to seamlessly transfer data between devices and ensures that the final model and results are ready for use in environments where only CPU resources are available.

Google Sites

Report abuse