This is a quick tutorial created by prompting ChatGPT in 2023 with some relatively small edits from me. It is a pretty good for those who know python and would like to start with PyTorch. My systematic tutorial, please refer to the official PyTorch tutorial: https://pytorch.org/tutorials/
PyTorch is a deep learning library that has gained immense popularity, especially for research and dynamic computation. Given your background in Python, NumPy, SciPy, Scikit-Learn, and pandas, you have a solid foundation to dive into PyTorch. This brief tutorial will introduce you to the basics of PyTorch and showcase how it can be used for various tasks.
To start, you'd first need to install PyTorch. Depending on your system and needs, installation instructions can vary, but generally:
Code example
pip install torch torchvision
Tensors
At its core, PyTorch operates on tensors, which are multi-dimensional arrays similar to NumPy arrays.
Code example
import torch
# Create a 1D tensor
a = torch.tensor([1, 2, 3])
print(a)
# Create a 2x3 tensor (matrix) filled with ones
b = torch.ones(2, 3)
print(b)
NumPy and Tensor Interoperability
Given your familiarity with NumPy, you'd be pleased to know that PyTorch has excellent interoperability with NumPy:
Code example
import numpy as np
# Convert a NumPy array to a PyTorch tensor
numpy_arr = np.array([1, 2, 3])
tensor_from_numpy = torch.from_numpy(numpy_arr)
print(tensor_from_numpy)
# Convert a PyTorch tensor to a NumPy array
tensor = torch.tensor([1, 2, 3])
numpy_from_tensor = tensor.numpy()
print(numpy_from_tensor)
One of the standout features of PyTorch is its ability to compute gradients automatically. This is useful for optimizing models using gradient descent.
Code example
# Create a tensor and set requires_grad=True to track computation
x = torch.tensor(2.0, requires_grad=True)
# Define a simple operation
y = x ** 2
# Compute gradients
y.backward()
# Print the gradient
print(x.grad) # Outputs 4.0 (dy/dx = 2*x at x=2)
PyTorch's torch.nn module contains pre-defined layers, loss functions, and optimization algorithms:
Code example
import torch.nn as nn
import torch.optim as optim
# Define a simple feed-forward neural network
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.fc = nn.Linear(1, 1) # 1 input neuron, 1 output neuron
def forward(self, x):
return self.fc(x)
# Create the model, loss function, and optimizer
model = SimpleNN()
loss_fn = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
# Dummy data
inputs = torch.tensor([[1.0], [2.0], [3.0]])
outputs = torch.tensor([[2.0], [4.0], [6.0]])
# Training loop
for epoch in range(100):
# Forward pass
predictions = model(inputs)
loss = loss_fn(predictions, outputs)
# Backward pass and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 10 == 0:
print(f"Epoch [{epoch+1}/100], Loss: {loss.item():.4f}")
The code above trains a simple neural network to approximate the function y = 2x. Given your background, note that the process involves forward propagation (predicting outputs), computing the loss, and back-propagating the error to adjust the weights.
As mentioned earlier, tensors in PyTorch are multi-dimensional arrays. These are similar to NumPy arrays, but with the added capability to run on GPUs and other hardware accelerators.
Tensor Operations
Just like NumPy, PyTorch offers a wide variety of operations for tensor manipulation:
Reshaping:
python
Copy code
a = torch.randn(2, 3) # Creates a 2x3 tensor with random values
print(a)
b = a.view(3, 2) # Reshapes the tensor to 3x2
print(b)
Element-wise Operations:
python
Copy code
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = x + y
print(z)
Matrix Multiplication:
python
Copy code
x = torch.randn(2, 3)
y = torch.randn(3, 2)
z = torch.mm(x, y)
print(z)
Reductions:
python
Copy code
x = torch.tensor([[1, 2], [3, 4]])
sum_x = torch.sum(x)
mean_x = torch.mean(x)
print(sum_x, mean_x)
And many more such operations are available. You'll find that if you're familiar with NumPy, many of these operations will feel very familiar, just with a slightly different syntax.
The autograd system in PyTorch provides automatic differentiation for all operations on tensors. It's what allows us to train neural networks without manually computing gradients.
How it works:
When you create a tensor with requires_grad=True, PyTorch starts to track all operations on it.
When you finish your computation, you call .backward(), and PyTorch will compute gradients automatically and store them in the .grad attribute of the tensor.
To prevent tracking, you can call .detach() on a tensor or wrap the computation inside with torch.no_grad():.
Examples:
Simple Gradient:
Code example
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2
y.backward()
print(x.grad) # Outputs 4.0 (dy/dx = 2*x at x=2)
More complex computations:
Code example
x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)
z = x * y + y ** 2
z.backward()
print(x.grad) # Outputs 3.0 (dz/dx = y at y=3)
print(y.grad) # Outputs 8.0 (dz/dy = x + 2*y at x=2, y=3)
Stopping autograd from tracking:
Code example
x = torch.tensor(2.0, requires_grad=True)
with torch.no_grad():
y = x ** 2
print(y.requires_grad) # Outputs False
One of the core principles behind autograd is the chain rule of differentiation. When you do a series of operations and call .backward(), PyTorch will use the chain rule to compute the gradients.
Each operation creates a node in a computation graph, and these nodes have references to the inputs and outputs of the operation. When you call .backward(), PyTorch traverses this graph in reverse order (from the loss, typically, to the input data) to compute gradients.
Understanding this can be especially helpful when debugging or when trying to create custom operations or layers.
In conclusion, PyTorch's tensor operations combined with its autograd system offer a flexible and powerful way to build and train complex models, without much of the manual heavy lifting.
You said:
can you show me an example of a more complex case that show the chain rule for the autograd?
ChatGPT said:
ChatGPT
Certainly! Let's walk through a more intricate example step by step to illustrate the chain rule in action with PyTorch's autograd.
Given the functions:
f(u,v)=u2+3uv+5v2f(u, v) = u^2 + 3uv + 5v^2f(u,v)=u2+3uv+5v2
u(x,y)=xy+sin(x)u(x, y) = xy + sin(x)u(x,y)=xy+sin(x)
v(x,y)=y2−cos(y)v(x, y) = y^2 - cos(y)v(x,y)=y2−cos(y)
We want to compute the gradient of fff with respect to xxx and yyy, i.e., ∂f∂x\frac{\partial f}{\partial x}∂x∂f and ∂f∂y\frac{\partial f}{\partial y}∂y∂f when x=πx = \pix=π and y=1y = 1y=1.
Using the chain rule, we can express the derivatives as:
∂f∂x=∂f∂u×∂u∂x+∂f∂v×∂v∂x\frac{\partial f}{\partial x} = \frac{\partial f}{\partial u} \times \frac{\partial u}{\partial x} + \frac{\partial f}{\partial v} \times \frac{\partial v}{\partial x}∂x∂f=∂u∂f×∂x∂u+∂v∂f×∂x∂v
∂f∂y=∂f∂u×∂u∂y+∂f∂v×∂v∂y\frac{\partial f}{\partial y} = \frac{\partial f}{\partial u} \times \frac{\partial u}{\partial y} + \frac{\partial f}{\partial v} \times \frac{\partial v}{\partial y}∂y∂f=∂u∂f×∂y∂u+∂v∂f×∂y∂v
Now, let's calculate this using PyTorch:
Code example
import torch
from math import pi
# Define x and y as tensors, with requires_grad set to True to track computations
x = torch.tensor(pi, requires_grad=True)
y = torch.tensor(1.0, requires_grad=True)
# Define u and v in terms of x and y
u = x * y + torch.sin(x)
v = y**2 - torch.cos(y)
# Define f in terms of u and v
f = u**2 + 3*u*v + 5*v**2
# Compute the gradients
f.backward()
# Results
print(f"df/dx: {x.grad.item()}")
print(f"df/dy: {y.grad.item()}")
When you run this, PyTorch uses its autograd system to automatically compute the derivatives based on the computation graph. Behind the scenes, it's applying the chain rule as we've formulated.
Note that you don't have to manually differentiate any of the functions or apply the chain rule yourself. PyTorch takes care of all the differentiation work with the .backward() call. The gradients (or derivatives) for each tensor with requires_grad=True are stored in the .grad attribute of that tensor.
A big advantage of using PyTorch is it eases the forth and back with the GPU. Here are some basic steps
First, let's check if a GPU is available on your system.
import torch
# Check if a GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
If a GPU is available, this will print cuda, otherwise, it will print cpu, which means operations will run on the CPU.
To use the GPU, you simply need to transfer the tensors to the GPU using .to(device) or .cuda():
# Create a tensor and transfer it to the GPU
x = torch.randn(3, 3)
x = x.to(device) # Or alternatively: x = x.cuda()
print(x)
# Perform operations on the GPU
y = torch.ones(3, 3, device=device)
z = x + y
print(z)
# Transfer result back to CPU if needed
z_cpu = z.to("cpu")
print(z_cpu)
The key here is that once a tensor is on the GPU, all subsequent operations on that tensor will also be executed on the GPU. You can move the tensors back to the CPU using .to('cpu').
To run a neural network on the GPU, you need to transfer both the model and the data to the GPU.
Example: Simple Neural Network on GPU
python
Copy code
import torch
import torch.nn as nn
import torch.optim as optim
# Define a simple feed-forward neural network
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.fc = nn.Linear(1, 1) # 1 input neuron, 1 output neuron
def forward(self, x):
return self.fc(x)
# Check device (GPU or CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Instantiate the model and move it to the GPU
model = SimpleNN().to(device)
# Create dummy data and transfer it to the GPU
inputs = torch.tensor([[1.0], [2.0], [3.0]], device=device)
outputs = torch.tensor([[2.0], [4.0], [6.0]], device=device)
# Define a loss function and an optimizer
loss_fn = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
# Training loop (on GPU)
for epoch in range(100):
optimizer.zero_grad()
# Forward pass
predictions = model(inputs)
# Compute loss
loss = loss_fn(predictions, outputs)
# Backward pass and optimization
loss.backward()
optimizer.step()
if (epoch+1) % 10 == 0:
print(f"Epoch [{epoch+1}/100], Loss: {loss.item():.4f}")
Move the model to GPU: You move the model by calling model.to(device), where device is "cuda" if a GPU is available.
Move data to GPU: You move the data (input tensors) to the GPU by calling .to(device) on the tensors.
Ensure computations happen on the GPU: Once the model and data are on the GPU, all operations (forward pass, loss computation, backward pass) will run on the GPU.
Move results back to the CPU (optional): If you need to examine the results on the CPU or perform further operations that require CPU, you can move tensors back using .to("cpu").
By following these steps, you can easily accelerate your PyTorch models using the power of GPUs. After training a model on the GPU, you might want to move the trained model and tensors (such as outputs, parameters, or any intermediate results) back to the CPU, especially if you need to use them in a CPU environment (e.g., for saving, visualization, or further processing).
Here’s how you can do that:
Once the training is done, you can move the entire model back to the CPU using .to("cpu"). This ensures that all model parameters are now on the CPU.
python
Copy code
# Move the trained model back to the CPU
model = model.to("cpu")
Now the model is on the CPU and can be used for further operations without requiring a GPU.
Similarly, after training, you might want to move certain tensors (e.g., outputs or model parameters) from the GPU to the CPU:
python
Copy code
# Example tensor on GPU (after some training or processing)
output = torch.tensor([1.0, 2.0, 3.0], device="cuda")
# Move tensor back to CPU
output_cpu = output.to("cpu")
# Now the tensor is on the CPU
print(output_cpu)
Let’s add this to the neural network training example we discussed earlier:
python
Copy code
import torch
import torch.nn as nn
import torch.optim as optim
# Define a simple feed-forward neural network
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.fc = nn.Linear(1, 1) # 1 input neuron, 1 output neuron
def forward(self, x):
return self.fc(x)
# Check device (GPU or CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Instantiate the model and move it to the GPU
model = SimpleNN().to(device)
# Create dummy data and transfer it to the GPU
inputs = torch.tensor([[1.0], [2.0], [3.0]], device=device)
outputs = torch.tensor([[2.0], [4.0], [6.0]], device=device)
# Define a loss function and an optimizer
loss_fn = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
# Training loop (on GPU)
for epoch in range(100):
optimizer.zero_grad()
# Forward pass
predictions = model(inputs)
# Compute loss
loss = loss_fn(predictions, outputs)
# Backward pass and optimization
loss.backward()
optimizer.step()
if (epoch+1) % 10 == 0:
print(f"Epoch [{epoch+1}/100], Loss: {loss.item():.4f}")
# After training, move the trained model to CPU
model = model.to("cpu")
# If you want to move the final predictions (or any tensor) back to CPU
predictions_cpu = predictions.to("cpu")
# Now, you can print or save the model's predictions from the CPU
print(predictions_cpu)
You might also want to save the model's state after training on the GPU and moving it to the CPU:
python
Copy code
# Save the model's state dictionary (after moving to CPU)
torch.save(model.state_dict(), "model.pth")
# Load the model state dict back
model_loaded = SimpleNN()
model_loaded.load_state_dict(torch.load("model.pth"))
model_loaded.eval() # Switch to evaluation mode
Moving the Model to CPU: Use model.to("cpu").
Moving Tensors to CPU: Use .to("cpu") on tensors (e.g., outputs, intermediate results).
Saving the Model: Use torch.save() to save the model’s state dict after moving it to CPU.
Loading the Model: Use torch.load() to load the model state dict.
This allows you to seamlessly transfer data between devices and ensures that the final model and results are ready for use in environments where only CPU resources are available.