fast.ai
October 2019 - December 2019
fast.ai course v3 (2019) - Part 1
Getting started
https://www.fast.ai/2019/01/24/course-v3/
https://forums.fast.ai/t/faq-resources-and-official-course-updates/27934
A blog post on what you need for deep learning: https://www.fast.ai/2017/11/16/what-you-need/
Seven lessons, each around 2 hours long, and you should plan to spend about 10 hours on assignments for each lesson.
Notes on how to setup the couse in GCP here: https://course.fast.ai/start_gcp.html
Notes on how to setup the course in Azure here: https://course.fast.ai/start_azure.html
There forum for the course in here: https://forums.fast.ai/c/part1-v3
The key applications covered are:
- Computer vision (e.g. classify pet photos by breed)
- Image classification
- Image localization (segmentation and activation maps)
- Image key-points
- NLP (e.g. movie review sentiment analysis)
- Language modeling
- Document classification
- Tabular data (e.g. sales prediction)
- Categorical data
- Continuous data
- Collaborative filtering (e.g. movie recommendation)
The course uses pytorch and the fastai wrapper.
The videos can be found in a YouTube playlist.
Lesson summaries
ideas taken from https://www.gse.harvard.edu/news/uk/09/01/education-bat-seven-principles-educators
Lesson 1 - Image classification
Recognize pet breeds.
Use of transfer learning.
Set the most important hyper-parameter when training neural networks: the learning rate, using Leslie Smith’s fantastic learning rate finder method.
Features that fastai provides for allowing you to easily add labels to your images.
Lesson 2 - Data cleaning and production; SGD from scratch
Put a model in production e.g. https://course.fast.ai/deployment_render.html
Using the model to find and fix mislabeled or incorrectly-collected images.
Create a model and our own gradient descent loop.
Lesson 3 - Data blocks; Multi-label classification; Segmentation
Use the Planet dataset (https://www.kaggle.com/c/planet-understanding-the-amazon-from-space)
Use the data block API to get the data into shape (more info here).
image segmentation - process of labeling every pixel in an image with a category that shows what kind of object is portrayed by that pixel.
Use CamVid dataset
Predict face keypoints (interesting areas)
Lesson 4 - NLP; Tabular data; Collaborative filtering; Embeddings
Predict whether a movie review is positive or negative using ULMFiT. Here's a popular science article on the model
- Create (or use pretrained) language model (predict the next word of a sentence)
- Fine-tune this language model using your target corpus
- Remove the encoder in this fine tuned language model, and replace it with a classifier. Then fine-tune this model for the final classification task
Cover tabular data (such as spreadsheets and database tables). Work with the fastai.tabular module to set up and train a model.
Collaborative filtering (recommendation systems).
An “embedding” is simply a computational shortcut for a particular type of matrix multiplication (a multiplication by a one-hot encoded matrix; e.g. word vectors).
Lesson 5 - Back propagation; Accelerated SGD; Neural net from scratch
Create a simple NN from scratch.
Look inside the weights of an embedding layer, to find out what our model has learned about our categorical variables.
Although embeddings are most widely known in the context of word embeddings for NLP, they are at least as important for categorical variables in general, such as for tabular data or collaborative filtering.
Lesson 6 - Regularization; Convolutions; Data ethics
Discuss some powerful techniques for improving training and avoiding over-fitting:
- Dropout: remove activations at random during training in order to regularize the model
- Data augmentation: modify model inputs during training in order to effectively increase data size
- Batch normalization: adjust the parameterization of a model in order to make the loss surface smoother.
Learn all about convolutions, which can be thought of as a variant of matrix multiplication with tied weights, and are the operation at the heart of modern computer vision models (and, increasingly, other types of models too).
Create a class activated map, which is a heat-map that shows which parts of an image were most important in making a prediction.
Learn about some of the ways in which models can go wrong, with a particular focus on feedback loops, why they cause problems, and how to avoid them.
ways in which bias in data can lead to biased algorithms
discuss questions that data scientists can and should be asking to help ensure that their work doesn’t lead to unexpected negative outcomes
Lesson 7 - Resnets from scratch; U-net; Generative (adversarial) networks
One of the most important techniques in modern architectures: the skip connection. most famously used in the resnet, which is the architecture we’ve used throughout this course for image classification
look at the U-net architecture, which uses a different type of skip connection to greatly improve segmentation results (and also for similar tasks where the output structure is similar to the input).
Use the U-net architecture to train a super-resolution model. This is a model which can increase the resolution of a low-quality image. Our model won’t only increase resolution—it will also remove jpeg artifacts and unwanted text watermarks.
In order to make our model produce high quality results, we will need to create a custom loss function which incorporates feature loss (also known as perceptual loss), along with gram loss. These techniques can be used for many other types of image generation task, such as image colorization.
Learn about a recent loss function known as generative adversarial loss (used in generative adversarial networks, or GANs), which can improve the quality of generative models in some contexts, at the cost of speed.
Train GANs more quickly and reliably than standard approaches, by leveraging transfer learning.
Combines architectural innovations and loss function approaches that haven’t been used in this way before.
Learn how to create a recurrent neural net (RNN) from scratch. They are a simple refactoring of a regular multi-layer network.
Installing fast.ai
https://docs.fast.ai/#Installation-and-updating
$ conda create -n fastai python=3.7
$ conda activate fastai
$ conda install -c pytorch -c fastai fastai
$ python
>>> from fastai.vision import *
>>> path = untar_data(URLs.MNIST_SAMPLE)
>>> data = ImageDataBunch.from_folder(path)
>>> learn = cnn_learner(data, models.resnet18, metrics=accuracy)
>>> learn.fit(1)
Lesson 0: PyTorch intro
Here is a PyTorch tutorial:
What is Torch.NN?
Download the MNIST data:
from pathlib import Path
import requests
DATA_PATH = Path("data")
PATH = DATA_PATH / "mnist"
PATH.mkdir(parents=True, exist_ok=True)
URL = "http://deeplearning.net/data/mnist/"
FILENAME = "mnist.pkl.gz"
if not (PATH / FILENAME).exists():
content = requests.get(URL + FILENAME).content
(PATH / FILENAME).open("wb").write(content)
This dataset is in numpy array format, and has been stored using pickle, a python-specific format for serializing data.
import pickle
import gzip
with gzip.open((PATH / FILENAME).as_posix(), "rb") as f:
((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding="latin-1")
Each image is 28 x 28, and is being stored as a flattened row of length 784 (=28x28). Let’s take a look at one; we need to reshape it to 2d first.
from matplotlib import pyplot
import numpy as np
pyplot.imshow(x_train[0].reshape((28, 28)), cmap="gray")
print(x_train.shape)
Convert to torch.tensor
import torch
x_train, y_train, x_valid, y_valid = map(
torch.tensor, (x_train, y_train, x_valid, y_valid)
)
n, c = x_train.shape
x_train, x_train.shape, y_train.min(), y_train.max()
print(x_train, y_train)
print(x_train.shape)
print(y_train.min(), y_train.max())
Neural net from scratch. PyTorch provides methods to create random or zero-filled tensors, which we will use to create our weights and bias for a simple linear model. tell PyTorch that they require a gradient. This causes PyTorch to record all of the operations done on the tensor, so that it can calculate the gradient during back-propagation automatically!
For the weights, we set requires_grad
after the initialization, since we don’t want that step included in the gradient. (Note that a trailling _
in PyTorch signifies that the operation is performed in-place.)
Initialize weights with Xavier initialisation (by multiplying with 1/sqrt(n)).
import math
weights = torch.randn(784, 10) / math.sqrt(784)
weights.requires_grad_()
bias = torch.zeros(10, requires_grad=True)
we can use any standard Python function (or callable object) as a model! So let’s just write a plain matrix multiplication and broadcasted addition to create a simple linear model. we’ll write log_softmax and use it. PyTorch will even create fast GPU or vectorized CPU code for your function automatically. the @
stands for the dot product operation. softmax.
def log_softmax(x):
return x - x.exp().sum(-1).log().unsqueeze(-1)
def model(xb):
return log_softmax(xb @ weights + bias)
call our function on one batch of data (in this case, 64 images). This is one forward pass. Note that our predictions won’t be any better than random at this stage, since we start with random weights.
bs = 64 # batch size
xb = x_train[0:bs] # a mini-batch from x
preds = model(xb) # predictions
preds[0], preds.shape
print(preds[0], preds.shape)
the preds
tensor contains not only the tensor values, but also a gradient function. We’ll use this later to do backprop.
implement negative log-likelihood to use as the loss function
def nll(input, target):
return -input[range(target.shape[0]), target].mean()
loss_func = nll
check our loss with our random model, so we can see if we improve after a backprop pass later.
yb = y_train[0:bs]
print(loss_func(preds, yb))
Calculate the accuracy of our model. For each prediction, if the index with the largest value matches the target value, then the prediction was correct.
def accuracy(out, yb):
preds = torch.argmax(out, dim=1)
return (preds == yb).float().mean()
check the accuracy of our random model, so we can see if our accuracy improves as our loss improves.
print(accuracy(preds, yb))
run a training loop. For each iteration:
- select a mini-batch of data (of size
bs
) - use the model to make predictions
- calculate the loss
loss.backward()
updates the gradients of the model, in this case,weights
andbias
use these gradients to update the weights and bias. We do this within the torch.no_grad()
context manager, because we do not want these actions to be recorded for our next calculation of the gradient. You can read more about how PyTorch's Autograd records operations here
set the gradients to zero, so that we are ready for the next loop. Otherwise, our gradients would record a running tally of all the operations that had happened (i.e. loss.backward()
adds the gradients to whatever is already stored, rather than replacing them).
You can use the standard python debugger to step through PyTorch code, allowing you to check the various variable values at each step. Uncomment set_trace()
below to try it out.
from IPython.core.debugger import set_trace
lr = 0.5 # learning rate
epochs = 2 # how many epochs to train for
for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
# set_trace()
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
with torch.no_grad():
weights -= weights.grad * lr
bias -= bias.grad * lr
weights.grad.zero_()
bias.grad.zero_()
That’s it: we’ve created and trained a minimal neural network (in this case, a logistic regression, since we have no hidden layers) entirely from scratch!
Let’s check the loss and accuracy and compare those to what we got earlier. We expect that the loss will have decreased and accuracy to have increased.
print(loss_func(model(xb), yb), accuracy(model(xb), yb))
Using torch.nn.functional
Refactor our code, so that it does the same thing as before, only we'll start taking advantage of PyTorch's nn
classes to make it more concise and flexible. At each step from here, we should be making our code one or more of: shorter, more understandable, and/or more flexible.
make our code shorter by replacing our hand-written activation and loss functions with those from torch.nn.functional
(which is generally imported into the namespace F
by convention). This module contains all the functions in the torch.nn
library (whereas other parts of the library contain classes). As well as a wide range of loss and activation functions, you'll also find here some convenient functions for creating neural nets, such as pooling functions. (There are also functions for doing convolutions, linear layers, etc, but as we'll see, these are usually better handled using other parts of the library.)
Pytorch provides a single function F.cross_entropy
that combines negative log likelihood loss and log softmax activation
import torch.nn.functional as F
loss_func = F.cross_entropy
def model(xb):
return xb @ weights + bias
No longer call log_softmax
in the model
function. Let's confirm that our loss and accuracy are the same as before:
print(loss_func(model(xb), yb), accuracy(model(xb), yb))
Refactor using nn.Module
use nn.Module
and nn.Parameter
, for a clearer and more concise training loop. subclass nn.Module
(which itself is a class and able to keep track of state). In this case, we want to create a class that holds our weights, bias, and method for the forward step. nn.Module
has a number of attributes and methods (such as .parameters()
and .zero_grad()
) which we will be using.
from torch import nn
class Mnist_Logistic(nn.Module):
def __init__(self):
super().__init__()
self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))
self.bias = nn.Parameter(torch.zeros(10))
def forward(self, xb):
return xb @ self.weights + self.bias
Now using an object instead of just using a function, we first have to instantiate our model:
model = Mnist_Logistic()
calculate the loss in the same way as before. Note that nn.Module
objects are used as if they are functions (i.e they are callable), but behind the scenes Pytorch will call our forward
method automatically.
print(loss_func(model(xb), yb))
Previously for our training loop we had to update the values for each parameter by name, and manually zero out the grads for each parameter separately. Now we can take advantage of model.parameters() and model.zero_grad() (which are both defined by PyTorch for nn.Module
) to make those steps more concise and less prone to the error of forgetting some of our parameters, particularly if we had a more complicated model:
with torch.no_grad():
for p in model.parameters(): p -= p.grad * lr
model.zero_grad()
We’ll wrap our little training loop in a fit
function so we can run it again later.
def fit():
for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
start_i = i * bs # bs is batch size
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
with torch.no_grad():
for p in model.parameters():
p -= p.grad * lr
model.zero_grad()
fit()
Double-check that our loss has gone down:
print(loss_func(model(xb), yb))
Refactor using nn.Linear
We continue to refactor our code. Instead of manually defining and initializing self.weights
and self.bias
, and calculating xb @ self.weights + self.bias
, we will instead use the Pytorch class nn.Linear for a linear layer, which does all that for us. Pytorch has many types of predefined layers that can greatly simplify our code, and often makes it faster too.
class Mnist_Logistic(nn.Module):
def __init__(self):
super().__init__()
self.lin = nn.Linear(784, 10)
def forward(self, xb):
return self.lin(xb)
We instantiate our model and calculate the loss in the same way as before:
model = Mnist_Logistic()
print(loss_func(model(xb), yb))
still able to use our same fit
method as before.
fit()
print(loss_func(model(xb), yb))
Refactor using optim
Pytorch also has a package with various optimization algorithms, torch.optim
. We can use the step
method from our optimizer to take a forward step, instead of manually updating each parameter.
This will let us replace our previous manually coded optimization step and instead use just:
opt.step()
opt.zero_grad()
optim.zero_grad()
resets the gradient to 0 and we need to call it before computing the gradient for the next minibatch.
from torch import optim
define a little function to create our model and optimizer so we can reuse it in the future
def get_model():
model = Mnist_Logistic()
return model, optim.SGD(model.parameters(), lr=lr) # lr is learning rate
model, opt = get_model()
print(loss_func(model(xb), yb))
for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step()
opt.zero_grad()
print(loss_func(model(xb), yb))
Refactor using Dataset
PyTorch has an abstract Dataset class. A Dataset can be anything that has a __len__
function (called by Python’s standard len
function) and a __getitem__
function as a way of indexing into it. This tutorial walks through a nice example of creating a custom FacialLandmarkDataset
class as a subclass of Dataset
.
PyTorch’s TensorDataset is a Dataset wrapping tensors. By defining a length and way of indexing, this also gives us a way to iterate, index, and slice along the first dimension of a tensor. This will make it easier to access both the independent and dependent variables in the same line as we train.
from torch.utils.data import TensorDataset
Both x_train
and y_train
can be combined in a single TensorDataset
, which will be easier to iterate over and slice.
train_ds = TensorDataset(x_train, y_train)
Previously, we had to iterate through minibatches of x and y values separately. Now, we can do these two steps together:
xb,yb = train_ds[i*bs : i*bs+bs]
model, opt = get_model()
for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
xb, yb = train_ds[i * bs: i * bs + bs]
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step()
opt.zero_grad()
print(loss_func(model(xb), yb))
Refactor using DataLoader
Pytorch’s DataLoader
is responsible for managing batches. You can create a DataLoader
from any Dataset
. DataLoader
makes it easier to iterate over batches. Rather than having to use train_ds[i*bs : i*bs+bs]
, the DataLoader gives us each minibatch automatically.
from torch.utils.data import DataLoader
train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs)
Now, our loop is much cleaner, as (xb, yb) are loaded automatically from the data loader:
for xb,yb in train_dl:
pred = model(xb)
model, opt = get_model()
for epoch in range(epochs):
for xb, yb in train_dl:
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step()
opt.zero_grad()
print(loss_func(model(xb), yb))
Thanks to Pytorch’s nn.Module
, nn.Parameter
, Dataset
, and DataLoader
, our training loop is now dramatically smaller and easier to understand. Let’s now try to add the basic features necessary to create effecive models in practice.
Add validation
In section 1, we were just trying to get a reasonable training loop set up for use on our training data. In reality, you always should also have a validation set, in order to identify if you are overfitting.
Shuffling the training data is important to prevent correlation between batches and overfitting. On the other hand, the validation loss will be identical whether we shuffle the validation set or not. Since shuffling takes extra time, it makes no sense to shuffle the validation data.
We’ll use a batch size for the validation set that is twice as large as that for the training set. This is because the validation set does not need backpropagation and thus takes less memory (it doesn’t need to store the gradients). We take advantage of this to use a larger batch size and compute the loss more quickly.
train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)
valid_ds = TensorDataset(x_valid, y_valid)
valid_dl = DataLoader(valid_ds, batch_size=bs * 2)
calculate and print the validation loss at the end of each epoch.
(Note that we always call model.train()
before training, and model.eval()
before inference, because these are used by layers such as nn.BatchNorm2d
and nn.Dropout
to ensure appropriate behaviour for these different phases.)
model, opt = get_model()
for epoch in range(epochs):
model.train()
for xb, yb in train_dl:
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step()
opt.zero_grad()
model.eval()
with torch.no_grad():
valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)
print(epoch, valid_loss / len(valid_dl))
Create fit() and get_data()
Since we go through a similar process twice of calculating the loss for both the training set and the validation set, let’s make that into its own function, loss_batch
, which computes the loss for one batch.
pass an optimizer in for the training set, and use it to perform backprop. For the validation set, we don’t pass an optimizer, so the method doesn’t perform backprop.
def loss_batch(model, loss_func, xb, yb, opt=None):
loss = loss_func(model(xb), yb)
if opt is not None:
loss.backward()
opt.step()
opt.zero_grad()
return loss.item(), len(xb)
fit
runs the necessary operations to train our model and compute the training and validation losses for each epoch.
import numpy as np
def fit(epochs, model, loss_func, opt, train_dl, valid_dl):
for epoch in range(epochs):
model.train()
for xb, yb in train_dl:
loss_batch(model, loss_func, xb, yb, opt)
model.eval()
with torch.no_grad():
losses, nums = zip(
*[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]
)
val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)
print(epoch, val_loss)
get_data
returns dataloaders for the training and validation sets.
def get_data(train_ds, valid_ds, bs):
return (
DataLoader(train_ds, batch_size=bs, shuffle=True),
DataLoader(valid_ds, batch_size=bs * 2),
)
Now our whole process of obtaining the data loaders and fitting the model can be run in 3 lines of code:
train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
model, opt = get_model()
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
You can use these basic 3 lines of code to train a wide variety of models. Let’s see if we can use them to train a convolutional neural network (CNN)!
Switch to CNN
We are now going to build our neural network with three convolutional layers. Because none of the functions in the previous section assume anything about the model form, we’ll be able to use them to train a CNN without any modification.
We will use Pytorch’s predefined Conv2d class as our convolutional layer. We define a CNN with 3 convolutional layers. Each convolution is followed by a ReLU. At the end, we perform an average pooling. (Note that view
is PyTorch’s version of numpy’s reshape
)
class Mnist_CNN(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)
self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)
self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)
def forward(self, xb):
xb = xb.view(-1, 1, 28, 28)
xb = F.relu(self.conv1(xb))
xb = F.relu(self.conv2(xb))
xb = F.relu(self.conv3(xb))
xb = F.avg_pool2d(xb, 4)
return xb.view(-1, xb.size(1))
lr = 0.1
Momentum is a variation on stochastic gradient descent that takes previous updates into account as well and generally leads to faster training.
model = Mnist_CNN()
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
nn.Sequential
torch.nn
has another handy class we can use to simply our code: Sequential . A Sequential
object runs each of the modules contained within it, in a sequential manner. This is a simpler way of writing our neural network.
To take advantage of this, we need to be able to easily define a custom layer from a given function. For instance, PyTorch doesn’t have a view layer, and we need to create one for our network. Lambda
will create a layer that we can then use when defining a network with Sequential
.
class Lambda(nn.Module):
def __init__(self, func):
super().__init__()
self.func = func
def forward(self, x):
return self.func(x)
def preprocess(x):
return x.view(-1, 1, 28, 28)
The model created with Sequential
is:
model = nn.Sequential(
Lambda(preprocess),
nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.AvgPool2d(4),
Lambda(lambda x: x.view(x.size(0), -1)),
)
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
Wrapping DataLoader
Our CNN is fairly concise, but it only works with MNIST, because:
- It assumes the input is a 28*28 long vector
- It assumes that the final CNN grid size is 4*4 (since that’s the average pooling kernel size we used)
Let’s get rid of these two assumptions, so our model works with any 2d single channel image. First, we can remove the initial Lambda layer but moving the data preprocessing into a generator:
def preprocess(x, y):
return x.view(-1, 1, 28, 28), y
class WrappedDataLoader:
def __init__(self, dl, func):
self.dl = dl
self.func = func
def __len__(self):
return len(self.dl)
def __iter__(self):
batches = iter(self.dl)
for b in batches:
yield (self.func(*b))
train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
train_dl = WrappedDataLoader(train_dl, preprocess)
valid_dl = WrappedDataLoader(valid_dl, preprocess)
replace nn.AvgPool2d
with nn.AdaptiveAvgPool2d
, which allows us to define the size of the output tensor we want, rather than the input tensor we have. As a result, our model will work with any size input.
model = nn.Sequential(
nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.AdaptiveAvgPool2d(1),
Lambda(lambda x: x.view(x.size(0), -1)),
)
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
Using your GPU
check that your GPU is working in Pytorch:
print(torch.cuda.is_available())
create a device object for it:
dev = torch.device(
"cuda") if torch.cuda.is_available() else torch.device("cpu")
Update preprocess
to move batches to the GPU:
def preprocess(x, y):
return x.view(-1, 1, 28, 28).to(dev), y.to(dev)
train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
train_dl = WrappedDataLoader(train_dl, preprocess)
valid_dl = WrappedDataLoader(valid_dl, preprocess)
move our model to the GPU
model.to(dev)
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
It runs faster now:
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
Summary
- torch.nn:
Module
: creates a callable which behaves like a function, but can also contain state(such as neural net layer weights). It knows whatParameter
(s) it contains and can zero all their gradients, loop through them for weight updates, etc.Parameter
: a wrapper for a tensor that tells aModule
that it has weights that need updating during backprop. Only tensors with the requires_gradattribute set are updatedfunctional
: a module(usually imported into theF
namespace by convention) which contains activation functions, loss functions, etc, as well as non-stateful versions of layers such as convolutional and linear layers.
torch.optim
: Contains optimizers such asSGD
, which update the weights ofParameter
during the backward stepDataset
: An abstract interface of objects with a__len__
and a__getitem__
, including classes provided with Pytorch such asTensorDataset
DataLoader
: Takes anyDataset
and creates an iterator which returns batches of data.
Lesson 0: Jupyter notebook
Get lessons from https://github.com/fastai/course-v3
Course taught by Jeremy Howard GitHub
Make deep learning accessible: software, education, research, community
Some back ground on why jupyter notebook https://paulromer.net/jupyter-mathematica-and-the-future-of-the-research-paper/
The possibility of Jupyter notebooks being used for papers https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/
Esc
to enter edit mode and Enter
to enter command mode
Your notebook is autosaved every 120 seconds
Esc -> s
to save
Esc -> up arrow
to toggle cells
Esc -> b
to create a new cell
Esc -> 0 -> 0
to restart kernel
Esc -> m
to convert a cell to markdown
Esc -> y
to convert a cell to code
Esc -> d -> d
to delete a cell
Esc -> o
to hide output
?function-name
: Shows the definition and docstring for that function
??function-name
: Shows the source code for that function
doc(function-name)
: Shows the definition, docstring and links to the documentation of the function (only works with fastai library imported)
Use Tab
to autocomplete method and Shift + Tab
to show the input to that method.
Lesson 1: Image classification
https://course.fast.ai/videos/?lesson=1
Build our first image classifier from scratch.
The fastai library provides many useful functions that enable us to quickly and easily build neural networks and train our models.
Argue to use import * for testing.
from fastai.vision import *
from fastai.metrics import error_rate
Known datasets. Academic datasets are often in good shape and used frequently. Provide baselines so you know if your model is good.
The dataset is Oxford-IIIT Pet Dataset by Parkhi et al 2012 which features 12 cat breeds and 25 dog breeds (fine grain classification). The best accuracy they got in their dataset was 59.21%.
Download and extract the data.
Use untar_data
which requires Union (meaning other) pathlib.Path or str.
path = untar_data(URLs.PETS)
path.ls()
path_anno = path/'annotations' # can use / as path object
path_img = path/'images'
fnames = get_image_files(path_img)
pat = r'/([^/]+)_\d+.jpg$' # regular expression patterns
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats) # specify size of images to scale to using get_transforms()
data.show_batch(rows=3, figsize=(7,6)) # grab middle bit and resize it
# Get unique classes
print(data.classes)
len(data.classes),data.c
Training: resnet34
Model trained using a Learner. Sub class such as cnn_learner
Use a convolutional neural network backbone and a fully connected head with a single hidden layer as a classifier. Use resnet34. Trained of 1,000 classes on 1.5m images (ImageNet). Download pre-trained weights (transfer learning). Much quicker to train and don't need a big dataset.
learn = cnn_learner(data, models.resnet34, metrics=error_rate) # Requires ImageDataBunch and model architecture. resnet work pretty there's 34 and 50 (start smaller: 34). Print out error_rate.
learn.model
learn.fit_one_cycle(4) # 4 epochs on last few layers
learn.save('stage-1')
You can also do resnet50 but don't forget to reduce the batch size.
https://dawn.cs.stanford.edu/benchmark/#imagenet - shows scores of image classification models.
To see what comes out of the model:
learn
knows data and model
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()
interp.plot_top_losses(9, figsize=(15,11))
plot 4 things: prediction, actual, loss, and probability of actual class
Confusion matrix: for actual class how many times was it predicted to be that class
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)
If you have lots of classes best to use
interp.most_confused(min_val=2)
unfreeze our model and train the whole model
Zeiler and Furgus 2013 understand and visualizing cnn's.
We want to change the last layers which are specific features
learn.unfreeze()
learn.fit_one_cycle(1)
Leads to worse error as it is updating all layers e.g. diagonals.
learn.lr_find()
learn.recorder.plot()
Default lr is 0.003 which is were the loss increases a lot
learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4)) # this is dynamic for each layer between these two values
This gives a 10% increase in accuracy that before. You only really need these two layers.
Training: resnet50
resnet paper: https://arxiv.org/abs/1311.2901
bs = 16
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=299, bs=bs//2).normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet50, metrics=error_rate)
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(8)
learn.save('stage-1-50')
learn.unfreeze()
learn.fit_one_cycle(3, max_lr=slice(1e-6,1e-4))
learn.load('stage-1-50');
interp = ClassificationInterpretation.from_learner(learn)
interp.most_confused(min_val=2)
Other data formats
path = untar_data(URLs.MNIST_SAMPLE); path
tfms = get_transforms(do_flip=False)
The labels are what the folders are called so you can use from_folder
data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=26)
data.show_batch(rows=3, figsize=(5,5))
learn = cnn_learner(data, models.resnet18, metrics=accuracy)
learn.fit(2)
df = pd.read_csv(path/'labels.csv')
df.head()
if label is in csv file can use:
data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=28)
from df
data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
data.classes
grab label using regex
pat = r"/(\d)/\d+\.png$"
data = ImageDataBunch.from_name_re(path, fn_paths, pat=pat, ds_tfms=tfms, size=24)
data.classes
can write your own function
data = ImageDataBunch.from_name_func(path, fn_paths, ds_tfms=tfms, size=24,
label_func = lambda x: '3' if '/3/' in str(x) else '7')
data.classes
from lists
labels = [('3' if '/3/' in str(x) else '7') for x in fn_paths]
labels[:5]
data = ImageDataBunch.from_lists(path, fn_paths, labels=labels, ds_tfms=tfms, size=24)
data.classes
How to download your own image dataset
https://forums.fast.ai/t/tips-for-building-large-image-datasets/26688/3
https://lpdaacsvc.cr.usgs.gov/appeears/ - to get MERIS, ESA Sentinel-3
https://github.com/prairie-guy/ai_utilities
https://github.com/svenski/duckgoose
https://forums.fast.ai/t/generating-image-datasets-quickly/19079/9
https://forums.fast.ai/t/how-to-scrape-the-web-for-images/7446/3
Lesson 2a: Creating your own dataset from Google Images
Inspired by https://www.pyimagesearch.com/2017/12/04/how-to-create-a-deep-learning-dataset-using-google-images/
from fastai.vision import *
Create folders for the files
path = Path('data/bears')
folder = 'black'
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)
folder = 'teddys'
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)
folder = 'grizzly'
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)
Go to Google Images and type your search time. If you want to exclude anything you can do "canis lupus lupus" -dog to search for wolves for example. Tools -> Type -> Photo to only get photos. Type "black bear"
Run some Javascript code in your browser which will save the URLs of all the images you want for you dataset.
Turn off ad-block. and press Ctrl + Shift + J in your browser and run (press Enter to run)
urls = Array.from(document.querySelectorAll('.rg_di .rg_meta')).map(el=>JSON.parse(el.textContent).ou);
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));
Move the 'download' file into the bear folder and name the file urls_black.csv
Do the same for teddys and grizzly
Download the files
path = Path('data/bears')
file = 'urls_black.csv'
folder = 'black'
dest = path/folder
download_images(path/file, dest, max_pics=200)
file = 'urls_teddys.csv'
folder = 'teddys'
dest = path/folder
download_images(path/file, dest, max_pics=200)
file = 'urls_grizzly.csv'
folder = 'grizzly'
dest = path/folder
download_images(path/file, dest, max_pics=200)
Then clean up files that can't be opened:
classes = ['teddys','grizzly','black']
for c in classes:
print(c)
verify_images(path/c, delete=True, max_size=500)
May have to go through this a couple of times to get all the images.
View the data:
np.random.seed(42)
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
data.classes
data.show_batch(rows=3, figsize=(7,8))
Train the model:
learn = cnn_learner(data, models.resnet34, metrics=error_rate)
learn.fit_one_cycle(4)
learn.save('stage-1')
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(2, max_lr=slice(3e-5,3e-4))
learn.save('stage-2')
Interpret the model:
learn.load('stage-2');
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
Cleaning up the data:
Using the ImageCleaner
widget from fastai.widgets
we can prune our top losses, removing photos that don't belong.
from fastai.widgets import *
get the file paths from our top_losses. Use .from_toplosses
. Feed the top losses indexes and corresponding dataset to ImageCleaner
The widget will not delete images directly from disk but it will create a new csv file cleaned.csv
from where you can create a new ImageDataBunch with the corrected labels to continue training your model.
https://ipywidgets.readthedocs.io/en/latest/
create a new dataset without the split.
db = (ImageList.from_folder(path)
.split_none()
.label_from_folder()
.transform(get_transforms(), size=224)
.databunch()
)
Create a new learner to use our new databunch with all the images.
learn_cln = cnn_learner(db, models.resnet34, metrics=error_rate)
learn_cln.load('stage-2');
ds, idxs = DatasetFormatter().from_toplosses(learn_cln)
ImageCleaner(ds, idxs, path)
Flag photos for deletion by clicking 'Delete'
Find duplicates in your dataset and delete them! run .from_similars
to get the potential duplicates' ids and then run ImageCleaner
with duplicates=True
db = (ImageList.from_csv(path, 'cleaned.csv', folder='.')
.split_none()
.label_from_df()
.transform(get_transforms(), size=224)
.databunch()
)
learn_cln = cnn_learner(db, models.resnet34, metrics=error_rate)
learn.load('stage-2');
ds, idxs = DatasetFormatter().from_similars(learn_cln)
ImageCleaner(ds, idxs, path, duplicates=True)
db = (ImageList.from_csv(path, 'cleaned.csv', folder='.')
.split_none()
.label_from_df()
.transform(get_transforms(), size=224)
.databunch()
)
learn = cnn_learner(db, models.resnet34, metrics=error_rate)
Put the model into production
Export the content of our Learner
object for production
learn.export()
This will create a file named 'export.pkl' in the directory where we were working that contains everything we need to deploy our model (the model, the weights but also some metadata like the classes or the transforms/normalization used).
Putting the model in production
It is better to use a CPU than a GPU for scaling and because unlike a GPU it won't have to wait to build up a batch up 64 images before running hence making users wait.
Test your model on CPU like so:
defaults.device = torch.device('cpu')
img = open_image(path/'black'/'00000021.jpg')
img
learn = load_learner(path)
pred_class,pred_idx,outputs = learn.predict(img)
pred_class
Starlette app
$ pip install starlette
$ pip install uvicorn
Create a file called hello_world.py
from starlette.applications import Starlette
from starlette.responses import JSONResponse
import uvicorn
app = Starlette(debug=True)
@app.route('/')
async def homepage(request):
return JSONResponse({'hello': 'world'})
if __name__ == '__main__':
uvicorn.run(app, host='0.0.0.0', port=8000)
$ git clone https://github.com/encode/starlette-example.git
$ cd starlette-example
$ pip install aiofiles
$ scripts/run
https://github.com/simonw/cougar-or-not/blob/master/cougar.py
Parameters
You can play with learning rate and number of batches
Learning rate too high:
learn = cnn_learner(data, models.resnet34, metrics=error_rate)
learn.fit_one_cycle(1, max_lr=0.5)
error = 0.69
Your validation loss will explode.
Learning rate too low:
learn = cnn_learner(data, models.resnet34, metrics=error_rate)
# error rate goes down then back up
learn.fit_one_cycle(5, max_lr=1e-5)
learn.recorder.plot_losses()
Train loss is higher than validation loss.
Too few epochs
learn = cnn_learner(data, models.resnet34, metrics=error_rate, pretrained=False)
learn.fit_one_cycle(1)
Training loss is much higher than validation loss.
Too many epochs
np.random.seed(42)
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.9, bs=32,
ds_tfms=get_transforms(do_flip=False, max_rotate=0, max_zoom=1, max_lighting=0, max_warp=0
),size=224, num_workers=4).normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet50, metrics=error_rate, ps=0, wd=0)
learn.unfreeze()
learn.fit_one_cycle(40, slice(1e-6,1e-4))
Overfitting. Error rate improves for a while than gets worse again. Model should have train loss lower than validation loss.
Metric on validation dataset.
https://forums.fast.ai/t/lesson-1-official-resources-and-updates/27936
Lesson 2b: Data cleaning and production; SGD from scratch
https://course.fast.ai/videos/?lesson=2
Make sure you have the latest version of the code and the latest version of the course
$ conda update conda
$ conda update anaconda
$ conda activate fastai
$ conda update --all
Compare https://github.com/fastai/course-v3 to my local course.
cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1
jupyter notebook
Cool fast.ai work
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification - https://arxiv.org/abs/1608.04363
https://forums.fast.ai/t/share-your-work-here/27676/38
Learning CNNs
http://matrixmultiplication.xyz/
Linear regression with SGD
y = a1x1 + a2x2
%matplotlib inline
from fastai.basics import *
n=100
# x1 is noisy and x2 is full of ones.
x = torch.ones(n, 2)
# _ means replace values (i.e. don't return).
x[:,0].uniform_(-1., 1)
x[:5]
# a1 = 3; a2 = 2
a = tensor(3., 2.); a
# x@a is a matrix product
y = x@a + torch.rand(n)
plt.scatter(x[:,0], y);
Try and work out a1 and a2.
find parameters (weights) a
such that you minimize the error between the points and the line x@a
. For a regression problem the most common error function or loss function is the mean squared error.
def mse(y_hat, y): return ((y_hat - y) ** 2).mean()
Start with a guess of -1, 1.
a = tensor(-1., 1.)
y_hat = x@a
mse(y_hat, y)
plt.scatter(x[:,0], y)
plt.scatter(x[:,0], y_hat);
Change intercept and gradient. Four possibilities...
Derivative tells you how to move the line of best fit.
Call .grad
to get the derivative.
a = nn.Parameter(a); a
lr = 1e-1
def update(lr, a):
y_hat = x@a
loss = mse(y, y_hat)
if t % 10 == 0:
print(loss)
loss.backward() # calculate gradient
# Turn gradient calculation off when you do the SGD update
with torch.no_grad():
# substract the gradient from a
# _ means inplace
# subtract i.e. go in other 'direction' of loss.
a.sub_(lr * a.grad) # the derivative gets assigned to .grad
a.grad.zero_() # 0 out the gradient
for t in range(100):
update(lr, a)
plt.scatter(x[:, 0], y)
plt.scatter(x[:, 0], x@a);
from matplotlib import animation, rc
rc('animation', html='jshtml')
a = nn.Parameter(tensor(-1., 1.))
fig = plt.figure()
plt.scatter(x[:, 0], y, c='orange')
line, = plt.plot(x[:, 0], x@a)
plt.close()
def animate(i):
update(lr)
line.set_ydata(x@a)
return line,
animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)
In practice we use mini-batches.
a + bx (high bias - underfit)
a + bx + cx^2 (just right)
a + bx + cx^2 + dx^3 + ex^4 (high variance - overfit).
See also
You can deploy computer vision models at https://course.fast.ai/deployment_render.html
https://github.com/hiromis/notes/blob/master/Lesson2.md
https://www.youtube.com/watch?v=q6DGVGJ1WP4
https://responder.readthedocs.io/en/latest/
https://www.christianwerner.net/tech/Build-your-image-dataset-faster/
A systematic study of the class imbalance problem in convolutional neural networks - https://arxiv.org/abs/1710.05381
https://www.youtube.com/watch?v=q6DGVGJ1WP4 - There's no such thing as not a math person.
https://www.fast.ai/2017/11/13/validation-sets/ how (and why) to create a good validation set.
Lesson 3: Data blocks; Multi-label classification; Segmentation
https://course.fast.ai/videos/?lesson=3
Make sure you have the latest version of the code and the latest version of the course
$ conda update conda
$ conda update anaconda
$ conda activate fastai
$ conda update --all
Compare https://github.com/fastai/course-v3 to my local course.
cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1
jupyter notebook
More info on deploying webapps here - https://course.fast.ai/deployment_render.html
Classifiers people made are in - https://github.com/hiromis/notes/blob/master/Lesson3.md
Multi-label prediction with Planet Amazon dataset
%reload_ext autoreload
%autoreload 2
%matplotlib inline
# https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-planet.ipynb
from fastai.vision import *
Install kaggle
https://www.kaggle.com/c/planet-understanding-the-amazon-from-space
path = Config.data_path()/'planet'
path.mkdir(parents=True, exist_ok=True)
Download data
$ kaggle competitions download -c planet-understanding-the-amazon-from-space -f train-jpg.tar.7z -p /home/ray/.fastai/data/planet
$ kaggle competitions download -c planet-understanding-the-amazon-from-space -f train_v2.csv -p /home/ray/.fastai/data/planet
$ unzip -q -n /home/ray/.fastai/data/planet/train_v2.csv.zip -d /home/ray/.fastai/data/planet
Install 7zip and uncompress
$ sudo apt-get update
$ sudo apt install p7zip-full
$ 7za -bd -y -so x /home/ray/.fastai/data/planet/train-jpg.tar.7z | tar xf - -C /home/ray/.fastai/data/planet
Multiple layers for each tile
df = pd.read_csv(path/'train_v2.csv')
df.head()
Put this into a DataBunch and use ImageList
(and not ImageDataBunch
). The Dataset
class: https://pytorch.org/docs/stable/data.html#map-style-datasets
has __getitem__()
e.g. object[3] and __len__()
len(object) (dunder (double under)). This provides images and datasets
Create a mini-batch using a DataLoader(dataset)
. Use DataBunch
to bind train DataLoader
and valid DataLoader
.
np.random.seed(42)
src = (ImageList.from_csv(path, 'train_v2.csv', folder='train-jpg', suffix='.jpg') # Get images
.split_by_rand_pct(0.2)
.label_from_df(label_delim=' ')) # Get labels
# Flip vert to true as satellite data. warp (look at from above or below) set to 0 as satellite is from top.
tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
data = (src.transform(tfms, size=128)
.databunch().normalize(imagenet_stats))
data.show_batch(rows=3, figsize=(12,9))
Setup CNN and metrics (accuracy (argmax) and f-score). fbeta with beta=2 is equal to F2. Use threshold to keep classes that we think exists in a sample. Use partial function to call a function with the same keywords.
data.c
data.classes
arch = models.resnet50
acc_02 = partial(accuracy_thresh, thresh=0.2)
f_score = partial(fbeta, thresh=0.2)
learn = cnn_learner(data, arch, metrics=[acc_02, f_score])
learn.lr_find()
learn.recorder.plot()
lr = 0.01
learn.fit_one_cycle(5, slice(lr))
learn.save('stage-1-rn50')
Fine tune/fit a bit more. You could create a DataBunch with the wrong classified images and fit them (e.g. higher learning rate or more epochs).
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(5, slice(1e-5, lr / 5))
learn.save('stage-2-rn50')
This model was fit to images with size 128. Use transfer learning to fit to 256.
data = (src.transform(tfms, size=256)
.databunch(bs=32).normalize(imagenet_stats))
# Put new data into learn
learn.data = data
data.train_ds[0][0].shape
learn.freeze()
learn.lr_find()
learn.recorder.plot()
Train last two layers.
lr = 1e-2 / 2
learn.fit_one_cycle(5, slice(lr))
Image Segmentation with CamVid
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
from fastai.callbacks.hooks import *
from fastai.utils.mem import *
You can see the datasets at https://course.fast.ai/datasets
path = untar_data(URLs.CAMVID)
path.ls()
path_lbl = path/'labels'
path_img = path/'images'
fnames = get_image_files(path_img)
fnames[:3]
lbl_names = get_image_files(path_lbl)
lbl_names[:3]
img_f = fnames[0]
img = open_image(img_f)
img.show(figsize=(5, 5))
get_y_fn = lambda x: path_lbl/f'{x.stem}_P{x.suffix}'
mask = open_mask(get_y_fn(img_f))
mask.show(figsize=(5,5), alpha=1)
src_size = np.array(mask.shape[1:])
src_size,mask.data
codes = np.loadtxt(path/'codes.txt', dtype=str); codes
Datasets
size = src_size//2
free = gpu_mem_get_free_no_cache()
# the max size of bs depends on the available GPU RAM
if free > 8200: bs=8
else: bs=4
print(f"using bs={bs}, have {free}MB of GPU RAM free")
src = (SegmentationItemList.from_folder(path_img)
.split_by_fname_file('../valid.txt')
.label_from_func(get_y_fn, classes=codes))
# Transform y (the independent variable as well)
data = (src.transform(get_transforms(), size=size, tfm_y=True)
.databunch(bs=bs)
.normalize(imagenet_stats))
data.show_batch(2, figsize=(10,7))
data.show_batch(2, figsize=(10,7), ds_type=DatasetType.Valid)
Model
name2id = {v:k for k,v in enumerate(codes)}
# Label some pixel's void. Remove these.
void_code = name2id['Void']
# Custom metric
def acc_camvid(input, target):
target = target.squeeze(1)
mask = target != void_code
return (input.argmax(dim=1)[mask]==target[mask]).float().mean()
metrics=acc_camvid
# metrics=accuracy
wd=1e-2
For segmentation model
https://docs.fast.ai/vision.learner.html#unet_learner
https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd)
lr_find(learn)
learn.recorder.plot()
lr=3e-3
learn.fit_one_cycle(10, slice(lr), pct_start=0.9)
learn.save('stage-1')
learn.load('stage-1');
learn.show_results(rows=3, figsize=(8,9))
learn.unfreeze()
lrs = slice(lr/400,lr/4)
learn.fit_one_cycle(12, lrs, pct_start=0.8)
learn.save('stage-2');
Go Big
learn.destroy()
size = src_size
free = gpu_mem_get_free_no_cache()
# the max size of bs depends on the available GPU RAM
if free > 8200: bs=3
else: bs=1
print(f"using bs={bs}, have {free}MB of GPU RAM free")
data = (src.transform(get_transforms(), size=size, tfm_y=True)
.databunch(bs=bs)
.normalize(imagenet_stats))
learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd)
learn.load('stage-2');
lr_find(learn)
learn.recorder.plot()
learn.recorder.plot_lr()
Increase LR at start and decrease at end.
lr=1e-3
learn.fit_one_cycle(10, slice(lr), pct_start=0.8)
learn.save('stage-1-big')
learn.load('stage-1-big');
learn.unfreeze()
lrs = slice(1e-6,lr/10)
learn.fit_one_cycle(10, lrs)
learn.save('stage-2-big')
learn.load('stage-2-big');
learn.show_results(rows=3, figsize=(10,10))
Results are in https://arxiv.org/abs/1611.09326 (The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation).
Regression with BIWI head pose dataset
Find the center of a face (x and y pixels). Regression model.
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
path = untar_data(URLs.BIWI_HEAD_POSE)
Convert the data
cal = np.genfromtxt(path/'01'/'rgb.cal', skip_footer=6); cal
fname = '09/frame_00667_rgb.jpg'
def img2txt_name(f): return path/f'{str(f)[:-7]}pose.txt'
img = open_image(path/fname)
img.show()
ctr = np.genfromtxt(img2txt_name(fname), skip_header=3); ctr
def convert_biwi(coords):
c1 = coords[0] * cal[0][0]/coords[2] + cal[0][2]
c2 = coords[1] * cal[1][1]/coords[2] + cal[1][2]
return tensor([c2,c1])
def get_ctr(f):
ctr = np.genfromtxt(img2txt_name(f), skip_header=3)
return convert_biwi(ctr)
def get_ip(img,pts): return ImagePoints(FlowField(img.size, pts), scale=True)
get_ctr(fname)
ctr = get_ctr(fname)
img.show(y=get_ip(img, ctr), figsize=(6, 6))
validation on a person, set of points. transform y = true.
data = (PointsItemList.from_folder(path)
.split_by_valid_func(lambda o: o.parent.name=='13')
.label_from_func(get_ctr)
.transform(get_transforms(), tfm_y=True, size=(120,160))
.databunch().normalize(imagenet_stats)
)
Train model.
learn = cnn_learner(data, models.resnet34)
learn.lr_find()
learn.recorder.plot()
lr = 2e-2
learn.fit_one_cycle(5, slice(lr))
learn.save('stage-1')
learn.load('stage-1');
learn.show_results()
IMDB
from fastai.text import *
path = untar_data(URLs.IMDB_SAMPLE)
path.ls()
Create DataBunch
data_lm = TextDataBunch.from_csv(path, 'texts.csv')
data_lm.save()
data = load_data(path)
The steps that happen here are tokenization. Take the words and convert to token (e.g. word) lemmetize. Replace rare words with unknown. lower case. spaces
data = TextClasDataBunch.from_csv(path, 'texts.csv')
data.show_batch()
Replace text with number. Use vocab size of 60,000.
data.vocab.itos[:10]
data.train_ds[0][0]
data.train_ds[0][0].data[:10]
Use the data block API
data = (TextList.from_csv(path, 'texts.csv', cols='text')
.split_from_df(col=2)
.label_from_df(cols=0)
.databunch())
Create a language model. Use a pre-trained model on https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/ to guess what the next word is. Use transfer learning. ~1 billion tokens. Self supervised learning.
bs=48
path = untar_data(URLs.IMDB)
path.ls()
(path/'train').ls()
# lm = 'Language model'
data_lm = (TextList.from_folder(path)
#Inputs: all the text files in path
.filter_by_folder(include=['train', 'test', 'unsup'])
#We may have other temp folders that contain text files so we only keep what's in train and test
.split_by_rand_pct(0.1)
#We randomly split and keep 10% (10,000 reviews) for validation
.label_for_lm()
#We want to do a language model so we label accordingly
.databunch(bs=bs))
data_lm.save('data_lm.pkl')
Ignore labels and shuffle data
data_lm = load_data(path, 'data_lm.pkl', bs=bs)
data_lm.show_batch()
# https://docs.fast.ai/text.models.html#AWD_LSTM
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)
learn.lr_find()
learn.recorder.plot(skip_end=15)
learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7)) # moms is momentum
learn.save('fit_head')
learn.load('fit_head');
Fine tune
learn.unfreeze()
learn.fit_one_cycle(10, 1e-3, moms=(0.8,0.7))
learn.save('fine_tuned')
Test model output
learn.load('fine_tuned');
TEXT = "I liked this movie because"
N_WORDS = 40
N_SENTENCES = 2
print("\n".join(learn.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)))
learn.save_encoder('fine_tuned_enc')
Create classifier to predict review
path = untar_data(URLs.IMDB)
data_clas = (TextList.from_folder(path, vocab=data_lm.vocab)
#grab all the text files in path
.split_by_folder(valid='test')
#split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)
.label_from_folder(classes=['neg', 'pos'])
#label them all with their folders
.databunch(bs=bs))
data_clas.save('data_clas.pkl')
data_clas = load_data(path, 'data_clas.pkl', bs=bs)
data_clas.show_batch()
Create a model to classify those reviews and load the encoder we saved before
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7))
learn.save('first')
learn.load('first');
learn.freeze_to(-2) # unfreeze last two layers.
learn.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2), moms=(0.8,0.7))
# two values for slice and how quickly the lowest and highest layers learn.
# 2.6 comes from doing a RF on hyperparameters to predict accuracy
learn.save('second')
learn.load('second');
learn.freeze_to(-3) # unfreeze last three
learn.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3), moms=(0.8,0.7))
learn.save('third')
learn.load('third');
learn.unfreeze() # unfreeze the whole thing
learn.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3), moms=(0.8,0.7))
learn.predict("I really loved that movie, it was awesome!")
See also
https://www.kaggle.com/c/planet-understanding-the-amazon-from-space
https://docs.fast.ai/data_block.html
https://stackoverflow.com/questions/29133085/what-are-keypoints-in-image-processing
https://forums.fast.ai/t/deep-learning-lesson-3-notes/29829
https://github.com/hiromis/notes/blob/master/Lesson3.md
https://forums.fast.ai/t/lesson-3-in-class-discussion/29733
https://forums.fast.ai/t/lesson-3-links-to-different-parts-in-video/30077
https://www.coursera.org/learn/machine-learning
https://course.fast.ai/deployment_render.html
https://mmiakashs.github.io/blog/2018-09-20-kaggle-api-google-colab/
https://docs.python.org/3/library/functools.html#functools.partial
https://zulko.github.io/moviepy/
https://www.meetup.com/sfmachinelearning/events/255566613/
https://docs.fast.ai/vision.transform.html#List-of-transforms
https://arxiv.org/abs/1506.01186 - Cyclical Learning Rates for Training Neural Networks
Lesson 4: NLP; Tabular data; Collaborative filtering; Embeddings
https://course.fast.ai/videos/?lesson=4
Make sure you have the latest version of the code and the latest version of the course
$ conda update conda
$ conda update anaconda
$ conda activate fastai
$ conda update --all
Compare https://github.com/fastai/course-v3 to my local course.
cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1
jupyter notebook
Basic steps are:
- Create (or, preferred, download a pre-trained) language model trained on a large corpus such as Wikipedia (a "language model" is any model that learns to predict the next word of a sentence)
- Fine-tune this language model using your target corpus (in this case, IMDb movie reviews)
- Extract the encoder from this fine tuned language model, and pair it with a classifier. Then fine-tune this model for the final classification task (in this case, sentiment analysis).
Tabular
Use NN instead of GBT/RF as less feature engineering.
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson4-tabular.ipynb
from fastai.tabular import *
Download the adult dataset https://archive.ics.uci.edu/ml/datasets/adult
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
dep_var = 'salary'
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [FillMissing, Categorify, Normalize]
test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names, cont_names=cont_names)
data = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)
.split_by_idx(list(range(800,1000)))
.label_from_df(cols=dep_var)
.add_test(test)
.databunch())
learn = tabular_learner(data, layers=[200, 100], metrics=accuracy)
learn.lr_find()
learn.recorder.plot()
learn.fit(1, 1e-2)
Make prediction
row = df.iloc[0]
learn.predict(row)
Collab filtering
You can either have the data as a table e.g. User | movie | number of stars.
Or sparse matrix Users x Movies.
https://grouplens.org/datasets/movielens/
There in an up to date version of the movie lens dataset (https://grouplens.org/datasets/movielens/)
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson4-collab.ipynb
from fastai.collab import *
from fastai.tabular import *
user,item,title = 'userId','movieId','title'
path = untar_data(URLs.ML_SAMPLE)
ratings = pd.read_csv(path/'ratings.csv')
train a model
data = CollabDataBunch.from_df(ratings, seed=42)
y_range = [0, 5.5] # Range of scores
https://github.com/fastai/fastai/blob/master/fastai/collab.py#L96
https://github.com/fastai/fastai/blob/c498a576214edc9f7d91e16ef51988f26327e04e/fastai/collab.py#L36
https://pytorch.org/docs/stable/_modules/torch/nn/modules/sparse.html#Embedding
learn = collab_learner(data, n_factors=50, y_range=y_range)
learn.fit_one_cycle(3, 5e-3)
Movielens 100k
path=Config.data_path()/'ml-100k'
ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
names=[user,item,'rating','timestamp'])
ratings.head()
movies = pd.read_csv(path/'u.item', delimiter='|', encoding='latin-1', header=None,
names=[item, 'title', 'date', 'N', 'url', *[f'g{i}' for i in range(19)]])
movies.head()
len(ratings)
rating_movie = ratings.merge(movies[[item, title]])
rating_movie.head()
data = CollabDataBunch.from_df(rating_movie, seed=42, valid_pct=0.1, item_name=title)
data.show_batch()
y_range = [0, 5.5] # input to sigmoid (rating is 0.5, 5) add a bit to make it larger.
# factor is width of embedding size.
# wd is weight decay (regularization). Sum square of parameters (some are + and some are -) and multiple it by wd.
learn = collab_learner(data, n_factors=40, y_range=y_range, wd=1e-1)
learn.lr_find()
learn.recorder.plot(skip_end=15)
learn.fit_one_cycle(5, 5e-3)
learn.save('dotprod')
Interpret
learn.load('dotprod');
learn.model
g = rating_movie.groupby(title)['rating'].count()
top_movies = g.sort_values(ascending=False).index.values[:1000]
top_movies[:10]
Movie bias
movie_bias = learn.bias(top_movies, is_item=True) # is_item=True gives movie and is_item=False gives movies
movie_bias.shape
mean_ratings = rating_movie.groupby(title)['rating'].mean()
movie_ratings = [(b, i, mean_ratings.loc[i]) for i,b in zip(top_movies,movie_bias)]
item0 = lambda o:o[0]
sorted(movie_ratings, key=item0)[:15]
sorted(movie_ratings, key=item0, reverse=True)[:15]
Movie weights
movie_w = learn.weight(top_movies, is_item=True)
movie_w.shape
# squish those 40 factors into 4
movie_pca = movie_w.pca(3)
movie_pca.shape
fac0,fac1,fac2 = movie_pca.t()
movie_comp = [(f, i) for f,i in zip(fac0, top_movies)]
# Some aspect of taste and movie feature
sorted(movie_comp, key=itemgetter(0), reverse=True)[:10]
sorted(movie_comp, key=itemgetter(0))[:10]
movie_comp = [(f, i) for f,i in zip(fac1, top_movies)]
sorted(movie_comp, key=itemgetter(0), reverse=True)[:10]
sorted(movie_comp, key=itemgetter(0))[:10]
# PCA plot
idxs = np.random.choice(len(top_movies), 50, replace=False)
idxs = list(range(50))
X = fac0[idxs]
Y = fac2[idxs]
plt.figure(figsize=(15,15))
plt.scatter(X, Y)
for i, x, y in zip(top_movies[idxs], X, Y):
plt.text(x,y,i, color=np.random.rand(3)*0.7, fontsize=11)
plt.show()
Cold start problem - new user and new movie. Need a meta data (e.g. age and sex) model for new users and new models.
Predict user 1 will like movie 1.
Create 5 random number for each movie and 5 random numbers for each user. Then do a dot product get a value. Then update these values to get the matrix of movie ratings. Do RMSE of the matrix.
Embedding - matrix of weights. Bias - how much a user likes movies in general for e.g.
Use sigmoid to restrict output to 0 and 5.
Here is some benchmarking for movie-lens - https://www.librec.net/release/v1.3/example.html
See also
https://www.nytimes.com/2018/11/18/technology/artificial-intelligence-language.html (ULMFiT).
https://forums.fast.ai/t/deep-learning-lesson-4-notes/30983
https://forums.fast.ai/t/lesson-4-in-class-discussion/30318
Lesson 5: Back propagation; Accelerated SGD; Neural net from scratch
https://course.fast.ai/videos/?lesson=5
Make sure you have the latest version of the code and the latest version of the course
$ conda update conda
$ conda update anaconda
$ conda activate fastai
$ conda update --all
Compare https://github.com/fastai/course-v3 to my local course.
cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1
jupyter notebook
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson5-sgd-mnist.ipynb
Last layer is removed e.g. resnet as don't have 1,000 objects. Also last layers is trained on very specific things. Freeze earlier layers as these recognize basic patterns.
Give different parts of the model different learning rates. Small learning rate for earlier objects. Discriminate learning rates. For fit can use 1e-3 - every layer gets the same lr. slice(1e-3) where final layer gets lr of 1e-3 and earlier layers get 1e-3 / 3. slice(1e-5, 1e-3) first layer gets 1e-5 and last layer gets 1e-3 then groups get lr in between those values.
Affine function (http://mathworld.wolfram.com/AffineFunction.html)
Embedding - look something up in an array. Fast and efficient way of multiplying with OHE.
Works e.g. movie has John Travolta and user likes John Travolta (latent factors/hidden relationships). However, if movie is really bad (with John Travolta) need to add in a bias.
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson5-sgd-mnist.ipynb
%matplotlib inline
from fastai.basics import *
Get data from http://deeplearning.net/data/mnist/mnist.pkl.gz
path = Config().data_path()/'mnist'
with gzip.open(path/'mnist.pkl.gz', 'rb') as f:
((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')
plt.imshow(x_train[0].reshape((28,28)), cmap="gray")
x_train.shape
x_train,y_train,x_valid,y_valid = map(torch.tensor, (x_train,y_train,x_valid,y_valid))
n,c = x_train.shape
x_train.shape, y_train.min(), y_train.max()
bs=64
train_ds = TensorDataset(x_train, y_train) # you can index into it
valid_ds = TensorDataset(x_valid, y_valid)
data = DataBunch.create(train_ds, valid_ds, bs=bs)
x,y = next(iter(data.train_dl))
x.shape,y.shape
Sub classing
class Mnist_Logistic(nn.Module):
def __init__(self):
super().__init__() # Copy nn.Module's input.
self.lin = nn.Linear(784, 10, bias=True) # x@a + b
# put into mini-batch
def forward(self, xb): return self.lin(xb)
model = Mnist_Logistic().cuda()
model
model.lin
model(x).shape
# shows input and output size
[p.shape for p in model.parameters()]
lr=2e-2
# adds a softmax
loss_func = nn.CrossEntropyLoss()
def update(x,y,lr):
# weight decay value
wd = 1e-5
y_hat = model(x)
# weight decay
w2 = 0.
for p in model.parameters(): w2 += (p**2).sum()
# add to regular loss
loss = loss_func(y_hat, y) + w2*wd
loss.backward()
with torch.no_grad():
# Loop through the paramters
for p in model.parameters():
p.sub_(lr * p.grad)
p.grad.zero_()
return loss.item() # values
# for one mini-batch
losses = [update(x,y,lr) for x,y in data.train_dl]
plt.plot(losses);
class Mnist_NN(nn.Module):
def __init__(self):
super().__init__()
self.lin1 = nn.Linear(784, 50, bias=True)
self.lin2 = nn.Linear(50, 10, bias=True)
def forward(self, xb):
x = self.lin1(xb)
x = F.relu(x)
return self.lin2(x)
model = Mnist_NN().cuda()
losses = [update(x,y,lr) for x,y in data.train_dl]
plt.plot(losses);
model = Mnist_NN().cuda()
def update(x,y,lr):
opt = optim.Adam(model.parameters(), lr)
# opt = optim.SGD(model.parameters(), lr, momentum=0.9)
y_hat = model(x)
loss = loss_func(y_hat, y)
loss.backward()
opt.step()
opt.zero_grad()
return loss.item()
90% same direction as last time and 10% derivative (momentum).
s_t = alpha * g + (1 - alpha) * s_t-1.
RMSprop - gradient squared.
Adam - RMSprop and momentum
losses = [update(x,y,1e-3) for x,y in data.train_dl]
plt.plot(losses);
learn = Learner(data, Mnist_NN(), loss_func=loss_func, metrics=accuracy)
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(1, 1e-2)
# Learning rate per batch
learn.recorder.plot_lr(show_moms=True)
learn.recorder.plot_losses()
Cross entropy loss on two categories
Cat | Drop | Pred(Cat) | Pred(dog) | X-Entropy
1 | 0 | 0.5 | 0.5 | -1*log(0.5) -0*log(0.5).
Use softmax so they add up to 1.
See also
https://forums.fast.ai/t/deep-learning-lesson-5-notes/31298
https://github.com/hiromis/notes/blob/master/Lesson5.md
https://github.com/fastai/course-v3/blob/master/files/xl/collab_filter.xlsx
https://docs.google.com/spreadsheets/d/1oxY9bxgLPutRidhTrucFeg5Il0Jq7UdMJgR3igTtbPU/edit#gid=1748360111 - google sheets version
https://forums.fast.ai/t/google-sheets-versions-of-spreadsheets/10424/7
https://forums.fast.ai/t/lesson-5-discussion-thread/30864
Lesson 6: Regularization; Convolutions; Data ethics
https://course.fast.ai/videos/?lesson=6
Make sure you have the latest version of the code and the latest version of the course
$ conda update conda
$ conda update anaconda
$ conda activate fastai
$ conda update --all
Compare https://github.com/fastai/course-v3 to my local course.
cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1
jupyter notebook
Rossmann data clean
https://github.com/fastai/course-v3/blob/master/nbs/dl1/rossman_data_clean.ipynb
%reload_ext autoreload
%autoreload 2
from fastai.basics import *
Download data from here http://files.fast.ai/part2/lesson14/rossmann.tgz
PATH=Config().data_path()/Path('rossmann/')
table_names = ['train', 'store', 'store_states', 'state_names', 'googletrend', 'weather', 'test']
tables = [pd.read_csv(PATH/f'{fname}.csv', low_memory=False) for fname in table_names]
train, store, store_states, state_names, googletrend, weather, test = tables
len(train),len(test)
turn state Holidays to booleans
train.StateHoliday = train.StateHoliday!='0'
test.StateHoliday = test.StateHoliday!='0'
def join_df(left, right, left_on, right_on=None, suffix='_y'):
if right_on is None: right_on = left_on
return left.merge(right, how='left', left_on=left_on, right_on=right_on,
suffixes=("", suffix))
Join weather/state names
weather = join_df(weather, state_names, "file", "StateName")
Extracting dates and state names from the given data and adding those columns
googletrend['Date'] = googletrend.week.str.split(' - ', expand=True)[0]
googletrend['State'] = googletrend.file.str.split('_', expand=True)[2]
googletrend.loc[googletrend.State=='NI', "State"] = 'HB,NI'
Extracts particular date fields from a complete datetime for the purpose of constructing categoricals
def add_datepart(df, fldname, drop=True, time=False):
"Helper function that adds columns relevant to a date."
fld = df[fldname]
fld_dtype = fld.dtype
if isinstance(fld_dtype, pd.core.dtypes.dtypes.DatetimeTZDtype):
fld_dtype = np.datetime64
if not np.issubdtype(fld_dtype, np.datetime64):
df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True)
targ_pre = re.sub('[Dd]ate$', '', fldname)
attr = ['Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear',
'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start']
if time: attr = attr + ['Hour', 'Minute', 'Second']
for n in attr: df[targ_pre + n] = getattr(fld.dt, n.lower())
df[targ_pre + 'Elapsed'] = fld.astype(np.int64) // 10 ** 9
if drop: df.drop(fldname, axis=1, inplace=True)
add_datepart(weather, "Date", drop=False)
add_datepart(googletrend, "Date", drop=False)
add_datepart(train, "Date", drop=False)
add_datepart(test, "Date", drop=False)
Google trends data has a special category for the whole of the Germany - we'll pull that out so we can use it explicitly.
trend_de = googletrend[googletrend.file == 'Rossmann_DE']
Outer join all of our data into a single dataframe and check for nulls to make sure it works
store = join_df(store, store_states, "Store")
len(store[store.State.isnull()])
joined = join_df(train, store, "Store")
joined_test = join_df(test, store, "Store")
len(joined[joined.StoreType.isnull()]),len(joined_test[joined_test.StoreType.isnull()])
joined = join_df(joined, googletrend, ["State","Year", "Week"])
joined_test = join_df(joined_test, googletrend, ["State","Year", "Week"])
len(joined[joined.trend.isnull()]),len(joined_test[joined_test.trend.isnull()])
joined = joined.merge(trend_de, 'left', ["Year", "Week"], suffixes=('', '_DE'))
joined_test = joined_test.merge(trend_de, 'left', ["Year", "Week"], suffixes=('', '_DE'))
len(joined[joined.trend_DE.isnull()]),len(joined_test[joined_test.trend_DE.isnull()])
joined = join_df(joined, weather, ["State","Date"])
joined_test = join_df(joined_test, weather, ["State","Date"])
len(joined[joined.Mean_TemperatureC.isnull()]),len(joined_test[joined_test.Mean_TemperatureC.isnull()])
for df in (joined, joined_test):
for c in df.columns:
if c.endswith('_y'):
if c in df.columns: df.drop(c, inplace=True, axis=1)
fill in missing values to avoid complications with NA
's. Use random values
for df in (joined,joined_test):
df['CompetitionOpenSinceYear'] = df.CompetitionOpenSinceYear.fillna(1900).astype(np.int32)
df['CompetitionOpenSinceMonth'] = df.CompetitionOpenSinceMonth.fillna(1).astype(np.int32)
df['Promo2SinceYear'] = df.Promo2SinceYear.fillna(1900).astype(np.int32)
df['Promo2SinceWeek'] = df.Promo2SinceWeek.fillna(1).astype(np.int32)
extract features "CompetitionOpenSince" and "CompetitionDaysOpen"
for df in (joined,joined_test):
df["CompetitionOpenSince"] = pd.to_datetime(dict(year=df.CompetitionOpenSinceYear,
month=df.CompetitionOpenSinceMonth, day=15))
df["CompetitionDaysOpen"] = df.Date.subtract(df.CompetitionOpenSince).dt.days
replace some erroneous / outlying data
for df in (joined,joined_test):
df.loc[df.CompetitionDaysOpen<0, "CompetitionDaysOpen"] = 0
df.loc[df.CompetitionOpenSinceYear<1990, "CompetitionDaysOpen"] = 0
add "CompetitionMonthsOpen" field, limiting the maximum to 2 years to limit number of unique categories
for df in (joined,joined_test):
df["CompetitionMonthsOpen"] = df["CompetitionDaysOpen"]//30
df.loc[df.CompetitionMonthsOpen>24, "CompetitionMonthsOpen"] = 24
joined.CompetitionMonthsOpen.unique()
! pip install isoweek
from isoweek import Week
for df in (joined,joined_test):
df["Promo2Since"] = pd.to_datetime(df.apply(lambda x: Week(
x.Promo2SinceYear, x.Promo2SinceWeek).monday(), axis=1))
df["Promo2Days"] = df.Date.subtract(df["Promo2Since"]).dt.days
for df in (joined,joined_test):
df.loc[df.Promo2Days<0, "Promo2Days"] = 0
df.loc[df.Promo2SinceYear<1990, "Promo2Days"] = 0
df["Promo2Weeks"] = df["Promo2Days"]//7
df.loc[df.Promo2Weeks<0, "Promo2Weeks"] = 0
df.loc[df.Promo2Weeks>25, "Promo2Weeks"] = 25
df.Promo2Weeks.unique()
joined.to_pickle(PATH/'joined')
joined_test.to_pickle(PATH/'joined_test')
It is common when working with time series data to extract data that explains relationships across rows as opposed to columns, e.g.:
- Running averages
- Time until next event
- Time since last event
Define a function get_elapsed
for cumulative counting across a sorted dataframe.
def get_elapsed(fld, pre):
day1 = np.timedelta64(1, 'D')
last_date = np.datetime64()
last_store = 0
res = []
for s,v,d in zip(df.Store.values,df[fld].values, df.Date.values):
if s != last_store:
last_date = np.datetime64()
last_store = s
if v: last_date = d
res.append(((d-last_date).astype('timedelta64[D]') / day1))
df[pre+fld] = res
# Apply it to
columns = ["Date", "Store", "Promo", "StateHoliday", "SchoolHoliday"]
df = train[columns].append(test[columns])
Say we're looking at School Holiday. We'll first sort by Store, then Date, and then call add_elapsed('SchoolHoliday', 'After')
: This will apply to each row with School Holiday:
- A applied to every row of the dataframe in order of store and date
- Will add to the dataframe the days since seeing a School Holiday
- If we sort in the other direction, this will count the days until another holiday.
fld = 'SchoolHoliday'
df = df.sort_values(['Store', 'Date'])
get_elapsed(fld, 'After')
df = df.sort_values(['Store', 'Date'], ascending=[True, False])
get_elapsed(fld, 'Before')
fld = 'StateHoliday'
df = df.sort_values(['Store', 'Date'])
get_elapsed(fld, 'After')
df = df.sort_values(['Store', 'Date'], ascending=[True, False])
get_elapsed(fld, 'Before')
fld = 'Promo'
df = df.sort_values(['Store', 'Date'])
get_elapsed(fld, 'After')
df = df.sort_values(['Store', 'Date'], ascending=[True, False])
get_elapsed(fld, 'Before')
Set the active index to Date
df = df.set_index("Date")
Set null values from elapsed field calculations to 0.
columns = ['SchoolHoliday', 'StateHoliday', 'Promo']
for o in ['Before', 'After']:
for p in columns:
a = o+p
df[a] = df[a].fillna(0).astype(int)
Demonstrate window functions in pandas to calculate rolling quantities
sort by date (sort_index()
) and count the number of events of interest (sum()
) defined in columns
in the following week (rolling()
), grouped by Store (groupby()
). Do the same in the opposite direction.
bwd = df[['Store']+columns].sort_index().groupby("Store").rolling(7, min_periods=1).sum()
fwd = df[['Store']+columns].sort_index(ascending=False).groupby("Store").rolling(7, min_periods=1).sum()
Drop the Store indices grouped together in the window function
bwd.drop('Store',1,inplace=True)
bwd.reset_index(inplace=True)
fwd.drop('Store',1,inplace=True)
fwd.reset_index(inplace=True)
df.reset_index(inplace=True)
Merge these values onto the df
df = df.merge(bwd, 'left', ['Date', 'Store'], suffixes=['', '_bw'])
df = df.merge(fwd, 'left', ['Date', 'Store'], suffixes=['', '_fw'])
df.drop(columns,1,inplace=True)
Back up large tables of extracted / wrangled features before you join them onto another one
df.to_pickle(PATH/'df')
df["Date"] = pd.to_datetime(df.Date)
joined = pd.read_pickle(PATH/'joined')
joined_test = pd.read_pickle(PATH/f'joined_test')
joined = join_df(joined, df, ['Store', 'Date'])
joined_test = join_df(joined_test, df, ['Store', 'Date'])
removed all instances where the store had zero sale / was closed
joined = joined[joined.Sales!=0]
joined.reset_index(inplace=True)
joined_test.reset_index(inplace=True)
joined.to_pickle(PATH/'train_clean')
joined_test.to_pickle(PATH/'test_clean')
Rossmann
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson6-rossmann.ipynb
%reload_ext autoreload
%autoreload 2
from fastai.tabular import *
The most useful part of the data clean is:
add_datepart(train, "Date", drop=False)
Take the time piece and add a bunch of meta data e.g. year, month, day of week, month start or end, elapsed time since ... (Auto ML!). e.g. purchasing behavior may change on day of week.
path = Config().data_path()/'rossmann'
train_df = pd.read_pickle(path/'train_clean')
train_df.head().T
n = len(train_df); n
Pre-processors - run once before you do any training (on training set). Shared with the validation dataset.
Create a small subset of the data:
idx = np.random.permutation(range(n))[:2000]
idx.sort()
small_train_df = train_df.iloc[idx[:1000]]
small_test_df = train_df.iloc[idx[1000:]]
small_cont_vars = ['CompetitionDistance', 'Mean_Humidity']
small_cat_vars = ['Store', 'DayOfWeek', 'PromoInterval']
small_train_df = small_train_df[small_cat_vars + small_cont_vars + ['Sales']]
small_test_df = small_test_df[small_cat_vars + small_cont_vars + ['Sales']]
small_train_df.head()
First pre-processor is take the strings in PromoInterval
and find all unique values, create a list and convert them into numbers.
categorify = Categorify(small_cat_vars, small_cont_vars)
categorify(small_train_df)
categorify(small_test_df, test=True)
small_test_df.head()
see categories
small_train_df.PromoInterval.cat.categories
see codes
small_train_df['PromoInterval'].cat.codes[:5]
Another pre-processor is to fill missing values. Add's a columns called _na (boolean) and adds a medium value.
fill_missing = FillMissing(small_cat_vars, small_cont_vars)
fill_missing(small_train_df)
fill_missing(small_test_df, test=True)
Read in full dataset
train_df = pd.read_pickle(path/'train_clean')
test_df = pd.read_pickle(path/'test_clean')
Specify pre-processors
procs=[FillMissing, Categorify, Normalize]
cat_vars = ['Store', 'DayOfWeek', 'Year', 'Month', 'Day', 'StateHoliday', 'CompetitionMonthsOpen',
'Promo2Weeks', 'StoreType', 'Assortment', 'PromoInterval', 'CompetitionOpenSinceYear', 'Promo2SinceYear',
'State', 'Week', 'Events', 'Promo_fw', 'Promo_bw', 'StateHoliday_fw', 'StateHoliday_bw',
'SchoolHoliday_fw', 'SchoolHoliday_bw']
cont_vars = ['CompetitionDistance', 'Max_TemperatureC', 'Mean_TemperatureC', 'Min_TemperatureC',
'Max_Humidity', 'Mean_Humidity', 'Min_Humidity', 'Max_Wind_SpeedKm_h',
'Mean_Wind_SpeedKm_h', 'CloudCover', 'trend', 'trend_DE',
'AfterStateHoliday', 'BeforeStateHoliday', 'Promo', 'SchoolHoliday']
dep_var = 'Sales'
df = train_df[cat_vars + cont_vars + [dep_var,'Date']].copy()
test_df['Date'].min(), test_df['Date'].max()
Use date to create a validation dataset. Same length at test set
cut = train_df['Date'][(train_df['Date'] == train_df['Date'][len(test_df)])].index.max()
valid_idx = range(cut)
df[dep_var].head()
the dep variance is an int. Fastai will think it's a classification problem. Need to specific it's a regression by doing label_cls is a list of floats with log=True. Take the log of the dependent variable. Because eval metric is RMSPE take the log of y which makes it RMSE.
data = (TabularList.from_df(df, path=path, cat_names=cat_vars, cont_names=cont_vars, procs=procs,)
.split_by_idx(valid_idx)
.label_from_df(cols=dep_var, label_cls=FloatList, log=True)
.add_test(TabularList.from_df(test_df, path=path, cat_names=cat_vars, cont_names=cont_vars))
.databunch())
doc(FloatList)
Model
Pass in y_range which gives a sigmoid between 0 and an upper limit of the dependent variables.
max_log_y = np.log(np.max(train_df['Sales'])*1.2)
y_range = torch.tensor([0, max_log_y], device=defaults.device)
Pass in architecture. NN is 1000 * 500 paramters. This will overfit a data with a few hundred thousand rows. p's (probabilities) provide dropout
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
Emb_drop also provide dropout. Embedding is matmul of OHE.
learn = tabular_learner(data, layers=[1000,500], ps=[0.001,0.01], emb_drop=0.04,
y_range=y_range, metrics=exp_rmspe)
learn.model
First emb layer is number of stores (first cat variable). Second number is size of the embedding. Then batch norm of size 16 (16 input variables).
batch normalization: accelerating deep network training by reducing internal covariate shift. Loss function is less bumper so you can increase your LR.
y^ = f(w1,...wn, x)*g + b.
g + b are parameters for batch norm that help scale the output to expected range (mean and std).
len(data.train_ds.cont_names)
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(5, 1e-3, wd=0.2)
learn.save('1')
learn.recorder.plot_losses(last=-1)
learn.load('1');
learn.fit_one_cycle(5, 3e-4)
learn.fit_one_cycle(5, 3e-4)
test_preds=learn.get_preds(DatasetType.Test)
test_df["Sales"]=np.exp(test_preds[0].data).numpy().T[0]
test_df[["Id","Sales"]]=test_df[["Id","Sales"]].astype("int")
test_df[["Id","Sales"]].to_csv("rossmann_submission.csv",index=False)
Pets
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson6-pets-more.ipynb
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
bs = 64
path = untar_data(URLs.PETS)/'images'
Data augmentation
Ratchet up the defaults. What's the probability of an affine transform? What's the probability of a light transform?
https://docs.fast.ai/vision.transform.html#get_transforms
Look at validation dataset and see what the lighting looks like.
e.g. satellite data use rotated images.
Use flipped images.
Symmetric warp
tfms = get_transforms(max_rotate=20, max_zoom=1.3, max_lighting=0.4, max_warp=0.4,
p_affine=1., p_lighting=1.)
src = ImageList.from_folder(path).split_by_rand_pct(0.2, seed=2)
def get_data(size, bs, padding_mode='reflection'):
return (src.label_from_re(r'([^/]+)_\d+.jpg$')
.transform(tfms, size=size, padding_mode=padding_mode)
.databunch(bs=bs).normalize(imagenet_stats))
data = get_data(224, bs, 'zeros')
def _plot(i,j,ax):
x,y = data.train_ds[3]
x.show(ax, y=y)
plot_multi(_plot, 3, 3, figsize=(8,8))
data = get_data(224,bs)
plot_multi(_plot, 3, 3, figsize=(8,8))
Train a model
gc.collect()
learn = cnn_learner(data, models.resnet34, metrics=error_rate, bn_final=True)
learn.fit_one_cycle(3, slice(1e-2), pct_start=0.8)
learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-3), pct_start=0.8)
data = get_data(352,bs)
learn.data = data
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4))
learn.save('352')
Convolutional kernel
data = get_data(352,16)
learn = cnn_learner(data, models.resnet34, metrics=error_rate, bn_final=True).load('352')
idx=0
x,y = data.valid_ds[idx]
x.show()
data.valid_ds.y[idx]
k = tensor([
[0. , -5/3, 1],
[-5/3, -5/3, 1],
[1., 1 ,1],
]).expand(1, 3, 3, 3) / 6
k
k.shape
t = data.valid_ds[0][0].data; t.shape
t[None].shape
edge = F.conv2d(t[None], k)
show_image(edge[0], figsize=(5,5));
data.c
learn.model
https://www.fast.ai/2018/07/02/adam-weight-decay/
print(learn.summary())
heatmap
m = learn.model.eval();
m[0] is convolutional part
Create a mini-bath with 1 thing in it
xb,_ = data.one_item(x)
xb_im = Image(data.denorm(xb)[0])
xb = xb.cuda()
from fastai.callbacks.hooks import *
a hook allows you to hook into the fastai/python library and run python e.g. return the convolutional part or a certain layer. Hook the output of m[0]
def hooked_backward(cat=y):
with hook_output(m[0]) as hook_a:
with hook_output(m[0], grad=True) as hook_g:
preds = m(xb)
preds[0,int(cat)].backward()
return hook_a,hook_g
hook_a,hook_g = hooked_backward()
acts = hook_a.stored[0].cpu()
acts.shape
Take mean of channel axis
avg_acts = acts.mean(0)
avg_acts.shape
def show_heatmap(hm):
_,ax = plt.subplots()
xb_im.show(ax)
ax.imshow(hm, alpha=0.6, extent=(0,352,352,0),
interpolation='bilinear', cmap='magma');
show_heatmap(avg_acts)
Grad-CAM
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
grad = hook_g.stored[0][0].cpu()
grad_chan = grad.mean(1).mean(1)
grad.shape,grad_chan.shape
mult = (acts*grad_chan[...,None,None]).mean(0)
show_heatmap(mult)
Ethics and Data Science
Generative models - create new text, new image, new video, new sound.
Artificial Intelligence needs all of us | Rachel Thomas P.h.D. | TEDxSanFrancisco
Some Healthy Principles About Ethics & Bias In AI | Rachel Thomas @ PyBay2018
accuracy on lighter male vs darker skim female - http://gendershades.org/
https://www.crunchbase.com/organization/deep-glint#section-overview - Facial AI for surveillance
Text translation e.g. English -> Turkey -> English 'He is a doctor. She is a nurse'.
Compass - for law to suggest jail vs. bail.
Why?
- Input data - No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World
Get humans back in the loop.
Talk to domain experts and those impacted - https://fatconference.org/
Evan Estola - When Recommendations Systems Go Bad - MLconf SEA 2016
Datasheets for Datasets - better documentation regarding datasets.
See also
https://github.com/hiromis/notes/blob/master/Lesson6.md
https://forums.fast.ai/t/lesson-6-in-class-discussion/31440
https://forums.fast.ai/t/lesson-6-advanced-discussion/31442
https://platform.ai/ - comp vision start-up. - Upload pics and use it to help labels your pics based on a deep learning model (e.g. choose a layer or a choose a projection).
https://forums.fast.ai/t/platform-ai-discussion/31445
50 Years of Test (Un)fairness: Lessons for Machine Learning paper - https://128.84.21.199/pdf/1811.10104.pdf
Cornell conv course - http://www.cs.cornell.edu/courses/cs1114/2013sp/sections/S06_convolution.pdf
conv arithmetic - https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
https://arthurdouillard.com/post/normalization/ e.g. images
cross entropy loss - https://gombru.github.io/2018/05/23/cross_entropy_loss/
https://brohrer.github.io/how_convolutional_neural_networks_work.html
https://openframeworks.cc/ofBook/chapters/image_processing_computer_vision.html
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
https://knowingneurons.com/2014/10/29/hubel-and-wiesel-the-neural-basis-of-visual-perception/
https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Perception
Lesson 7: Resnets from scratch; U-net; Generative (adversarial) networks
https://course.fast.ai/videos/?lesson=7
Make sure you have the latest version of the code and the latest version of the course
$ conda update conda
$ conda update anaconda
$ conda activate fastai
$ conda update --all
Compare https://github.com/fastai/course-v3 to my local course.
cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1
jupyter notebook
Resnet MNIST
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-resnet-mnist.ipynb
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
path = untar_data(URLs.MNIST)
path.ls()
il = ImageList.from_folder(path, convert_mode='L') # Convert to grey scale
il.items[0]
defaults.cmap='binary'
il
il[0].show()
# Has labels therefore is valid not test
sd = il.split_by_folder(train='training', valid='testing')
sd
(path/'training').ls()
ll = sd.label_from_folder() # label list
ll
x,y = ll.train[0]
x.show()
print(y,x.shape)
# Transforms
tfms = ([*rand_pad(padding=3, size=28, mode='zeros')], [])
ll = ll.transform(tfms)
bs = 128
# not using imagenet_stats because not using pretrained model
data = ll.databunch(bs=bs).normalize()
x,y = data.train_ds[0]
x.show()
print(y)
def _plot(i,j,ax): data.train_ds[0][0].show(ax, cmap='gray')
plot_multi(_plot, 3, 3, figsize=(8,8))
xb,yb = data.one_batch()
xb.shape,yb.shape
data.show_batch(rows=3, figsize=(5,5))
Basic CNN with batchnorm
def conv(ni,nf): return nn.Conv2d(ni, nf, kernel_size=3, stride=2, padding=1)
model = nn.Sequential(
conv(1, 8), # 14
nn.BatchNorm2d(8),
nn.ReLU(),
conv(8, 16), # 7
nn.BatchNorm2d(16),
nn.ReLU(),
conv(16, 32), # 4
nn.BatchNorm2d(32),
nn.ReLU(),
conv(32, 16), # 2
nn.BatchNorm2d(16),
nn.ReLU(),
conv(16, 10), # 1
nn.BatchNorm2d(10),
Flatten() # remove (1,1) grid
)
learn = Learner(data, model, loss_func = nn.CrossEntropyLoss(), metrics=accuracy)
print(learn.summary())
model(xb).shape
learn.lr_find(end_lr=100)
learn.recorder.plot()
learn.fit_one_cycle(3, max_lr=0.1)
Refactor
def conv2(ni,nf): return conv_layer(ni,nf,stride=2)
model = nn.Sequential(
conv2(1, 8), # 14
conv2(8, 16), # 7
conv2(16, 32), # 4
conv2(32, 16), # 2
conv2(16, 10), # 1
Flatten() # remove (1,1) grid
)
learn = Learner(data, model, loss_func = nn.CrossEntropyLoss(), metrics=accuracy)
learn.fit_one_cycle(10, max_lr=0.1)
Resnet-ish
x -> Two layers (f(x)) -> f(x) + x. Identity/skipped connection.
class ResBlock(nn.Module):
def __init__(self, nf):
super().__init__()
self.conv1 = conv_layer(nf,nf)
self.conv2 = conv_layer(nf,nf)
def forward(self, x): return x + self.conv2(self.conv1(x))
help(res_block)
model = nn.Sequential(
conv2(1, 8),
res_block(8),
conv2(8, 16),
res_block(16),
conv2(16, 32),
res_block(32),
conv2(32, 16),
res_block(16),
conv2(16, 10),
Flatten()
)
def conv_and_res(ni,nf): return nn.Sequential(conv2(ni, nf), res_block(nf))
model = nn.Sequential(
conv_and_res(1, 8),
conv_and_res(8, 16),
conv_and_res(16, 32),
conv_and_res(32, 16),
conv2(16, 10),
Flatten()
)
learn = Learner(data, model, loss_func = nn.CrossEntropyLoss(), metrics=accuracy)
learn.lr_find(end_lr=100)
learn.recorder.plot()
learn.fit_one_cycle(12, max_lr=0.05)
print(learn.summary())
A guide to convolution arithmetic for deep learning
Could scale image up and use NN interp.
U-Net: Convolutional Networks for Biomedical Image Segmentation
Have to end up with something same size as image. Add padding outside input and in between things. Use skipped connections with the down part of u-net.
Image restoration.
Pretrained GAN
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-superres-gan.ipynb
import fastai
from fastai.vision import *
from fastai.callbacks import *
from fastai.vision.gan import *
path = untar_data(URLs.PETS)
path_hr = path/'images'
path_lr = path/'crappy'
Crappify
Resize to be small, pick a random number, draw it on image. e.g. if you want to color black and white image make it black and white.
from fastai.vision import *
from PIL import Image, ImageDraw, ImageFont
class crappifier(object):
def __init__(self, path_lr, path_hr):
self.path_lr = path_lr
self.path_hr = path_hr
def __call__(self, fn, i):
dest = self.path_lr/fn.relative_to(self.path_hr)
dest.parent.mkdir(parents=True, exist_ok=True)
img = PIL.Image.open(fn)
targ_sz = resize_to(img, 96, use_min=True)
img = img.resize(targ_sz, resample=PIL.Image.BILINEAR).convert('RGB')
w,h = img.size
q = random.randint(10,70)
ImageDraw.Draw(img).text((random.randint(0,w//2),random.randint(0,h//2)), str(q), fill=(255,255,255))
img.save(dest, quality=q)
from crappify import *
il = ImageList.from_folder(path_hr)
parallel(crappifier(path_lr, path_hr), il.items)
bs,size=32, 128
# bs,size = 24,160
#bs,size = 8,256
Pre-train generator
arch = models.resnet34
src = ImageImageList.from_folder(path_lr).split_by_rand_pct(0.1, seed=42)
def get_data(bs,size):
data = (src.label_from_func(lambda x: path_hr/x.name)
.transform(get_transforms(max_zoom=2.), size=size, tfm_y=True)
.databunch(bs=bs).normalize(imagenet_stats, do_y=True))
data.c = 3
return data
data_gen = get_data(bs,size)
data_gen.show_batch(4)
Make a U-net. Use a model with pre-trained wegiths
wd = 1e-3
y_range = (-3.,3.)
loss_gen = MSELossFlat() # flattens out images
def create_gen_learner():
return unet_learner(data_gen, arch, wd=wd, blur=True, norm_type=NormType.Weight,
self_attention=True, y_range=y_range, loss_func=loss_gen)
learn_gen = create_gen_learner()
learn_gen.fit_one_cycle(2, pct_start=0.8)
learn_gen.unfreeze() # Un-freeze model (res-net) down sample part
learn_gen.fit_one_cycle(3, slice(1e-6,1e-3))
learn_gen.show_results(rows=4)
learn_gen.save('gen-pre2')
Model works but leaves some artifacts. GAN -> Loss is discriminator/critic. Fine tune the generator.
Create the critic. Save the generated images.
learn_gen.load('gen-pre2');
name_gen = 'image_gen'
path_gen = path/name_gen
path_gen.mkdir(exist_ok=True)
def save_preds(dl):
i=0
names = dl.dataset.items
for b in dl:
preds = learn_gen.pred_batch(batch=b, reconstruct=True)
for o in preds:
o.save(path_gen/names[i].name)
i += 1
save_preds(data_gen.fix_dl)
PIL.Image.open(path_gen.ls()[0])
Train critic
learn_gen=None # Clear up GPU
gc.collect()
Pretrain the critic on crappy vs not crappy.
def get_crit_data(classes, bs, size):
src = ImageList.from_folder(path, include=classes).split_by_rand_pct(0.1, seed=42)
ll = src.label_from_folder(classes=classes)
data = (ll.transform(get_transforms(max_zoom=2.), size=size)
.databunch(bs=bs).normalize(imagenet_stats))
data.c = 3
return data
data_crit = get_crit_data([name_gen, 'images'], bs=bs, size=size)
data_crit.show_batch(rows=3, ds_type=DatasetType.Train, imgsize=3)
loss_critic = AdaptiveLoss(nn.BCEWithLogitsLoss()) # Binary cross-entropy
def create_critic_learner(data, metrics):
return Learner(data, gan_critic(), metrics=metrics, loss_func=loss_critic, wd=wd)
learn_critic = create_critic_learner(data_crit, accuracy_thresh_expand)
learn_critic.fit_one_cycle(6, 1e-3)
learn_critic.save('critic-pre2')
GAN
combine those pretrained model in a GAN
learn_crit=None
learn_gen=None
gc.collect()
data_crit = get_crit_data(['crappy', 'images'], bs=bs, size=size)
learn_crit = create_critic_learner(data_crit, metrics=None).load('critic-pre2')
learn_gen = create_gen_learner().load('gen-pre2')
switcher = partial(AdaptiveGANSwitcher, critic_thresh=0.65)
learn = GANLearner.from_learners(learn_gen, learn_crit, weights_gen=(1.,50.), show_img=False, switcher=switcher,
opt_func=partial(optim.Adam, betas=(0.,0.99)), wd=wd)
learn.callback_fns.append(partial(GANDiscriminativeLR, mult_lr=5.))
lr = 1e-4
learn.fit(40,lr)
learn.save('gan-1c')
learn.data=get_data(16,192)
learn.fit(10,lr/2)
learn.show_results(rows=16)
learn.save('gan-1c')
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-wgan.ipynb
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
from fastai.vision.gan import *
LSun bedroom data https://github.com/fyu/lsun
path = untar_data(URLs.LSUN_BEDROOMS)
Random noise of size 100 by default as inputs and the images of bedrooms as targets. tfm_y=True
in the transforms, then apply the normalization to the ys
def get_data(bs, size):
return (GANItemList.from_folder(path, noise_sz=100)
.split_none()
.label_from_func(noop)
.transform(tfms=[[crop_pad(size=size, row_pct=(0,1), col_pct=(0,1))], []], size=size, tfm_y=True)
.databunch(bs=bs)
.normalize(stats = [torch.tensor([0.5,0.5,0.5]), torch.tensor([0.5,0.5,0.5])], do_x=False, do_y=True))
begin with a small side and use gradual resizing
data = get_data(128, 64)
data.show_batch(rows=5)
Generative Adversarial Nets - https://arxiv.org/pdf/1406.2661.pdf
Train two models at the same time: a generator and a critic. The generator will try to make new images similar to the ones in our dataset, and the critic will try to classify real images from the ones the generator does. The generator returns images, the critic a single number (usually 0. for fake images and 1. for real ones).
We train them against each other in the sense that at each step (more or less), we:
- Freeze the generator and train the critic for one step by:
- getting one batch of true images (let's call that
real
) - generating one batch of fake images (let's call that
fake
) - have the critic evaluate each batch and compute a loss function from that; the important part is that it rewards positively the detection of real images and penalizes the fake ones
- update the weights of the critic with the gradients of this loss
- getting one batch of true images (let's call that
- Freeze the critic and train the generator for one step by:
- generating one batch of fake images
- evaluate the critic on it
- return a loss that rewards posisitivly the critic thinking those are real images; the important part is that it rewards positively the detection of real images and penalizes the fake ones
- update the weights of the generator with the gradients of this loss
Wasserstein GAN - https://arxiv.org/pdf/1701.07875.pdf
Create a generator and a critic that we pass to gan_learner
. The noise_size is the size of the random vector from which our generator creates images.
generator = basic_generator(in_size=64, n_channels=3, n_extra_layers=1)
critic = basic_critic (in_size=64, n_channels=3, n_extra_layers=1)
learn = GANLearner.wgan(data, generator, critic, switch_eval=False,
opt_func = partial(optim.Adam, betas = (0.,0.99)), wd=0.)
learn.fit(30,2e-4)
learn.gan_trainer.switch(gen_mode=True)
learn.show_results(ds_type=DatasetType.Train, rows=16, figsize=(8,8))
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Downsample encoder and upsample decoder.
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-superres.ipynb
Super resolution
import fastai
from fastai.vision import *
from fastai.callbacks import *
from fastai.utils.mem import *
from torchvision.models import vgg16_bn
path = untar_data(URLs.PETS)
path_hr = path/'images'
path_lr = path/'small-96'
path_mr = path/'small-256'
il = ImageList.from_folder(path_hr)
Crappify
def resize_one(fn, i, path, size):
dest = path/fn.relative_to(path_hr)
dest.parent.mkdir(parents=True, exist_ok=True)
img = PIL.Image.open(fn)
targ_sz = resize_to(img, size, use_min=True)
img = img.resize(targ_sz, resample=PIL.Image.BILINEAR).convert('RGB')
img.save(dest, quality=60)
# create smaller image sets the first time this nb is run
sets = [(path_lr, 96), (path_mr, 256)]
for p,size in sets:
if not p.exists():
print(f"resizing to {size} into {p}")
parallel(partial(resize_one, path=p, size=size), il.items)
bs,size=32,128
arch = models.resnet34
src = ImageImageList.from_folder(path_lr).split_by_rand_pct(0.1, seed=42)
def get_data(bs,size):
data = (src.label_from_func(lambda x: path_hr/x.name)
.transform(get_transforms(max_zoom=2.), size=size, tfm_y=True)
.databunch(bs=bs).normalize(imagenet_stats, do_y=True))
data.c = 3
return data
data = get_data(bs,size)
data.show_batch(ds_type=DatasetType.Valid, rows=2, figsize=(9,9))
Feature loss
t = data.valid_ds[0][1].data
t = torch.stack([t,t])
def gram_matrix(x):
n,c,h,w = x.size()
x = x.view(n, c, -1)
return (x @ x.transpose(1,2))/(c*h*w)
gram_matrix(t)
MAE loss
base_loss = F.l1_loss
Features has the convolutional part. Eval mode as not training. Turn off requires_grad as not updating weights.
vgg_m = vgg16_bn(True).features.cuda().eval()
requires_grad(vgg_m, False)
Find just before the max pool layers (relu).
blocks = [i-1 for i,o in enumerate(children(vgg_m)) if isinstance(o,nn.MaxPool2d)]
blocks, [vgg_m[i] for i in blocks]
class FeatureLoss(nn.Module):
def __init__(self, m_feat, layer_ids, layer_wgts):
super().__init__()
self.m_feat = m_feat
self.loss_features = [self.m_feat[i] for i in layer_ids]
self.hooks = hook_outputs(self.loss_features, detach=False)
self.wgts = layer_wgts
self.metric_names = ['pixel',] + [f'feat_{i}' for i in range(len(layer_ids))
] + [f'gram_{i}' for i in range(len(layer_ids))]
def make_features(self, x, clone=False):
self.m_feat(x)
return [(o.clone() if clone else o) for o in self.hooks.stored]
def forward(self, input, target):
out_feat = self.make_features(target, clone=True)
in_feat = self.make_features(input)
self.feat_losses = [base_loss(input,target)]
self.feat_losses += [base_loss(f_in, f_out)*w
for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
self.feat_losses += [base_loss(gram_matrix(f_in), gram_matrix(f_out))*w**2 * 5e3
for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
self.metrics = dict(zip(self.metric_names, self.feat_losses))
return sum(self.feat_losses)
def __del__(self): self.hooks.remove()
feat_loss = FeatureLoss(vgg_m, blocks[2:5], [5,15,2])
Train
wd = 1e-3
learn = unet_learner(data, arch, wd=wd, loss_func=feat_loss, callback_fns=LossMetrics,
blur=True, norm_type=NormType.Weight)
gc.collect();
learn.lr_find()
learn.recorder.plot()
lr = 1e-3
def do_fit(save_name, lrs=slice(lr), pct_start=0.9):
learn.fit_one_cycle(10, lrs, pct_start=pct_start)
learn.save(save_name)
learn.show_results(rows=1, imgsize=5)
do_fit('1a', slice(lr*10)) # Quicker than a GAN
learn.unfreeze()
do_fit('1b', slice(1e-5,lr))
data = get_data(12,size*2)
learn.data = data
learn.freeze()
gc.collect()
learn.load('1b');
do_fit('2a')
learn.unfreeze()
do_fit('2b', slice(1e-6,1e-4), pct_start=0.3)
Test
learn = None
gc.collect();
256/320*1024
256/320*1600
free = gpu_mem_get_free_no_cache()
# the max size of the test image depends on the available GPU RAM
if free > 8000: size=(1280, 1600) # > 8GB RAM
else: size=( 820, 1024) # <= 8GB RAM
print(f"using size={size}, have {free}MB of GPU RAM free")
learn = unet_learner(data, arch, loss_func=F.l1_loss, blur=True, norm_type=NormType.Weight)
data_mr = (ImageImageList.from_folder(path_mr).split_by_rand_pct(0.1, seed=42)
.label_from_func(lambda x: path_hr/x.name)
.transform(get_transforms(), size=size, tfm_y=True)
.databunch(bs=1).normalize(imagenet_stats, do_y=True))
data_mr.c = 3
learn.load('2b');
learn.data = data_mr
fn = data_mr.valid_ds.x.items[0]; fn
img = open_image(fn); img.shape
p,img_hr,b = learn.predict(img)
show_image(img, figsize=(18,15), interpolation='nearest');
Image(img_hr).show(figsize=(18,15))
Human numbers
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-human-numbers.ipynb
from fastai.text import *
bs=64
path = untar_data(URLs.HUMAN_NUMBERS)
path.ls()
def readnums(d): return [', '.join(o.strip() for o in open(path/d).readlines())]
train_txt = readnums('train.txt'); train_txt[0][:80]
valid_txt = readnums('valid.txt'); valid_txt[0][-80:]
train = TextList(train_txt, path=path)
valid = TextList(valid_txt, path=path)
src = ItemLists(path=path, train=train, valid=valid).label_for_lm()
data = src.databunch(bs=bs)
train[0].text[:80] # one document so 0
# retuns xxbos. xx is unknown token. bos is beginning of string.
len(data.valid_ds[0][0].data)
data.bptt, len(data.valid_dl) # bptt is back prop through times
https://github.com/fastai/fastai/blob/f93a5f028e2cf73448dda188682d437c610424c3/fastai/text/learner.py#L248
64 batches split into 70. 3 batches
13017/70/bs
it = iter(data.valid_dl)
x1,y1 = next(it)
x2,y2 = next(it)
x3,y3 = next(it)
it.close()
x1.numel()+x2.numel()+x3.numel()
x1.shape,y1.shape
x2.shape,y2.shape
x1[:,0]
y1[:,0]
Grab a vocab. Every mini-batch joins up with the next mini-batch.
v = data.valid_ds.vocab
v.textify(x1[0])
v.textify(y1[0])
v.textify(x2[0])
v.textify(x3[0])
v.textify(x1[1])
v.textify(x2[1])
v.textify(x3[1])
v.textify(x3[-1])
data.show_batch(ds_type=DatasetType.Valid)
Single fully connected model
data = src.databunch(bs=bs, bptt=3)
x,y = data.one_batch()
x.shape,y.shape
nv = len(v.itos); nv
nh=64
def loss4(input,target): return F.cross_entropy(input, target[:,-1])
def acc4 (input,target): return accuracy(input, target[:,-1])
class Model0(nn.Module):
def __init__(self):
super().__init__()
self.i_h = nn.Embedding(nv,nh) # green arrow
self.h_h = nn.Linear(nh,nh) # brown arrow
self.h_o = nn.Linear(nh,nv) # blue arrow
self.bn = nn.BatchNorm1d(nh)
def forward(self, x):
h = self.bn(F.relu(self.h_h(self.i_h(x[:,0]))))
if x.shape[1]>1:
h = h + self.i_h(x[:,1])
h = self.bn(F.relu(self.h_h(h)))
if x.shape[1]>2:
h = h + self.i_h(x[:,2])
h = self.bn(F.relu(self.h_h(h)))
return self.h_o(h)
learn = Learner(data, Model0(), loss_func=loss4, metrics=acc4)
learn.fit_one_cycle(6, 1e-4)
Same thing with a loop
class Model1(nn.Module):
def __init__(self):
super().__init__()
self.i_h = nn.Embedding(nv,nh) # green arrow
self.h_h = nn.Linear(nh,nh) # brown arrow
self.h_o = nn.Linear(nh,nv) # blue arrow
self.bn = nn.BatchNorm1d(nh)
def forward(self, x):
h = torch.zeros(x.shape[0], nh).to(device=x.device)
for i in range(x.shape[1]):
h = h + self.i_h(x[:,i])
h = self.bn(F.relu(self.h_h(h)))
return self.h_o(h)
learn = Learner(data, Model1(), loss_func=loss4, metrics=acc4)
learn.fit_one_cycle(6, 1e-4)
Multi-fully connected model
Use bptt as 20 (use 20 words to predict 21st?). Predict every word. e.g. array.
data = src.databunch(bs=bs, bptt=20)
x,y = data.one_batch()
x.shape,y.shape
class Model2(nn.Module):
def __init__(self):
super().__init__()
self.i_h = nn.Embedding(nv,nh)
self.h_h = nn.Linear(nh,nh)
self.h_o = nn.Linear(nh,nv)
self.bn = nn.BatchNorm1d(nh)
def forward(self, x):
h = torch.zeros(x.shape[0], nh).to(device=x.device)
res = []
for i in range(x.shape[1]):
h = h + self.i_h(x[:,i])
h = F.relu(self.h_h(h))
res.append(self.h_o(self.bn(h)))
return torch.stack(res, dim=1)
learn = Learner(data, Model2(), metrics=accuracy)
Maintain state
class Model3(nn.Module):
def __init__(self):
super().__init__()
self.i_h = nn.Embedding(nv,nh)
self.h_h = nn.Linear(nh,nh)
self.h_o = nn.Linear(nh,nv)
self.bn = nn.BatchNorm1d(nh)
self.h = torch.zeros(bs, nh).cuda()
def forward(self, x):
res = []
h = self.h
for i in range(x.shape[1]):
h = h + self.i_h(x[:,i])
h = F.relu(self.h_h(h))
res.append(self.bn(h))
self.h = h.detach()
res = torch.stack(res, dim=1)
res = self.h_o(res)
return res
learn = Learner(data, Model3(), metrics=accuracy)
learn.fit_one_cycle(20, 3e-3)
Stack RNN's
class Model4(nn.Module):
def __init__(self):
super().__init__()
self.i_h = nn.Embedding(nv,nh)
self.rnn = nn.RNN(nh,nh, batch_first=True)
self.h_o = nn.Linear(nh,nv)
self.bn = BatchNorm1dFlat(nh)
self.h = torch.zeros(1, bs, nh).cuda()
def forward(self, x):
res,h = self.rnn(self.i_h(x), self.h)
self.h = h.detach()
return self.h_o(self.bn(res))
learn = Learner(data, Model4(), metrics=accuracy)
learn.fit_one_cycle(20, 3e-3)
GRU/LSTM
Way to do some kind of drop out.
class Model5(nn.Module):
def __init__(self):
super().__init__()
self.i_h = nn.Embedding(nv,nh)
self.rnn = nn.GRU(nh, nh, 2, batch_first=True)
self.h_o = nn.Linear(nh,nv)
self.bn = BatchNorm1dFlat(nh)
self.h = torch.zeros(2, bs, nh).cuda()
def forward(self, x):
res,h = self.rnn(self.i_h(x), self.h)
self.h = h.detach()
return self.h_o(self.bn(res))
learn = Learner(data, Model5(), metrics=accuracy)
learn.fit_one_cycle(10, 1e-2)
Can also use for sequence labeling.
Document and test code
https://forums.fast.ai/t/dev-projects-index/29849
See also
Visualizing the Loss Landscape of Neural Nets
https://github.com/vdumoulin/conv_arithmetic
Perceptual Losses for Real-Time Style Transfer and Super-Resolution