fast.ai

October 2019 - December 2019

Getting started

https://www.fast.ai/2019/01/24/course-v3/

https://forums.fast.ai/t/faq-resources-and-official-course-updates/27934

A blog post on what you need for deep learning: https://www.fast.ai/2017/11/16/what-you-need/

Seven lessons, each around 2 hours long, and you should plan to spend about 10 hours on assignments for each lesson.

Notes on how to setup the couse in GCP here: https://course.fast.ai/start_gcp.html

Notes on how to setup the course in Azure here: https://course.fast.ai/start_azure.html

There forum for the course in here: https://forums.fast.ai/c/part1-v3

The key applications covered are:

  • Computer vision (e.g. classify pet photos by breed)
    • Image classification
    • Image localization (segmentation and activation maps)
    • Image key-points
  • NLP (e.g. movie review sentiment analysis)
    • Language modeling
    • Document classification
  • Tabular data (e.g. sales prediction)
    • Categorical data
    • Continuous data
  • Collaborative filtering (e.g. movie recommendation)

The course uses pytorch and the fastai wrapper.

The videos can be found in a YouTube playlist.


Lesson summaries

ideas taken from https://www.gse.harvard.edu/news/uk/09/01/education-bat-seven-principles-educators

Lesson 1 - Image classification

Recognize pet breeds.

Use of transfer learning.

Set the most important hyper-parameter when training neural networks: the learning rate, using Leslie Smith’s fantastic learning rate finder method.

Features that fastai provides for allowing you to easily add labels to your images.

Lesson 2 - Data cleaning and production; SGD from scratch

Put a model in production e.g. https://course.fast.ai/deployment_render.html

Using the model to find and fix mislabeled or incorrectly-collected images.

Create a model and our own gradient descent loop.

Lesson 3 - Data blocks; Multi-label classification; Segmentation

Use the Planet dataset (https://www.kaggle.com/c/planet-understanding-the-amazon-from-space)

Use the data block API to get the data into shape (more info here).

image segmentation - process of labeling every pixel in an image with a category that shows what kind of object is portrayed by that pixel.

Use CamVid dataset

Predict face keypoints (interesting areas)

Lesson 4 - NLP; Tabular data; Collaborative filtering; Embeddings

Predict whether a movie review is positive or negative using ULMFiT. Here's a popular science article on the model

  1. Create (or use pretrained) language model (predict the next word of a sentence)
  2. Fine-tune this language model using your target corpus
  3. Remove the encoder in this fine tuned language model, and replace it with a classifier. Then fine-tune this model for the final classification task

Cover tabular data (such as spreadsheets and database tables). Work with the fastai.tabular module to set up and train a model.

Collaborative filtering (recommendation systems).

An “embedding” is simply a computational shortcut for a particular type of matrix multiplication (a multiplication by a one-hot encoded matrix; e.g. word vectors).

Lesson 5 - Back propagation; Accelerated SGD; Neural net from scratch

Create a simple NN from scratch.

Look inside the weights of an embedding layer, to find out what our model has learned about our categorical variables.

Although embeddings are most widely known in the context of word embeddings for NLP, they are at least as important for categorical variables in general, such as for tabular data or collaborative filtering.

Lesson 6 - Regularization; Convolutions; Data ethics

Discuss some powerful techniques for improving training and avoiding over-fitting:

  • Dropout: remove activations at random during training in order to regularize the model
  • Data augmentation: modify model inputs during training in order to effectively increase data size
  • Batch normalization: adjust the parameterization of a model in order to make the loss surface smoother.

Learn all about convolutions, which can be thought of as a variant of matrix multiplication with tied weights, and are the operation at the heart of modern computer vision models (and, increasingly, other types of models too).

Create a class activated map, which is a heat-map that shows which parts of an image were most important in making a prediction.

Learn about some of the ways in which models can go wrong, with a particular focus on feedback loops, why they cause problems, and how to avoid them.

ways in which bias in data can lead to biased algorithms

discuss questions that data scientists can and should be asking to help ensure that their work doesn’t lead to unexpected negative outcomes

Lesson 7 - Resnets from scratch; U-net; Generative (adversarial) networks

One of the most important techniques in modern architectures: the skip connection. most famously used in the resnet, which is the architecture we’ve used throughout this course for image classification

look at the U-net architecture, which uses a different type of skip connection to greatly improve segmentation results (and also for similar tasks where the output structure is similar to the input).

Use the U-net architecture to train a super-resolution model. This is a model which can increase the resolution of a low-quality image. Our model won’t only increase resolution—it will also remove jpeg artifacts and unwanted text watermarks.

In order to make our model produce high quality results, we will need to create a custom loss function which incorporates feature loss (also known as perceptual loss), along with gram loss. These techniques can be used for many other types of image generation task, such as image colorization.

Learn about a recent loss function known as generative adversarial loss (used in generative adversarial networks, or GANs), which can improve the quality of generative models in some contexts, at the cost of speed.

Train GANs more quickly and reliably than standard approaches, by leveraging transfer learning.

Combines architectural innovations and loss function approaches that haven’t been used in this way before.

Learn how to create a recurrent neural net (RNN) from scratch. They are a simple refactoring of a regular multi-layer network.

Installing fast.ai

https://docs.fast.ai/#Installation-and-updating

$ conda create -n fastai python=3.7
$ conda activate fastai
$ conda install -c pytorch -c fastai fastai

$ python
>>> from fastai.vision import *
>>> path = untar_data(URLs.MNIST_SAMPLE)
>>> data = ImageDataBunch.from_folder(path)
>>> learn = cnn_learner(data, models.resnet18, metrics=accuracy)
>>> learn.fit(1)

Lesson 0: PyTorch intro

Here is a PyTorch tutorial:

What is Torch.NN?

Download the MNIST data:

from pathlib import Path
import requests

DATA_PATH = Path("data")
PATH = DATA_PATH / "mnist"

PATH.mkdir(parents=True, exist_ok=True)

URL = "http://deeplearning.net/data/mnist/"
FILENAME = "mnist.pkl.gz"

if not (PATH / FILENAME).exists():
        content = requests.get(URL + FILENAME).content
        (PATH / FILENAME).open("wb").write(content)

This dataset is in numpy array format, and has been stored using pickle, a python-specific format for serializing data.

import pickle
import gzip

with gzip.open((PATH / FILENAME).as_posix(), "rb") as f:
        ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding="latin-1")

Each image is 28 x 28, and is being stored as a flattened row of length 784 (=28x28). Let’s take a look at one; we need to reshape it to 2d first.

from matplotlib import pyplot
import numpy as np

pyplot.imshow(x_train[0].reshape((28, 28)), cmap="gray")
print(x_train.shape)

Convert to torch.tensor

import torch

x_train, y_train, x_valid, y_valid = map(
    torch.tensor, (x_train, y_train, x_valid, y_valid)
)
n, c = x_train.shape
x_train, x_train.shape, y_train.min(), y_train.max()
print(x_train, y_train)
print(x_train.shape)
print(y_train.min(), y_train.max())

Neural net from scratch. PyTorch provides methods to create random or zero-filled tensors, which we will use to create our weights and bias for a simple linear model. tell PyTorch that they require a gradient. This causes PyTorch to record all of the operations done on the tensor, so that it can calculate the gradient during back-propagation automatically!

For the weights, we set requires_grad after the initialization, since we don’t want that step included in the gradient. (Note that a trailling _ in PyTorch signifies that the operation is performed in-place.)

Initialize weights with Xavier initialisation (by multiplying with 1/sqrt(n)).

import math

weights = torch.randn(784, 10) / math.sqrt(784)
weights.requires_grad_()
bias = torch.zeros(10, requires_grad=True)

we can use any standard Python function (or callable object) as a model! So let’s just write a plain matrix multiplication and broadcasted addition to create a simple linear model. we’ll write log_softmax and use it. PyTorch will even create fast GPU or vectorized CPU code for your function automatically. the @ stands for the dot product operation. softmax.

def log_softmax(x):
    return x - x.exp().sum(-1).log().unsqueeze(-1)

def model(xb):
    return log_softmax(xb @ weights + bias)

call our function on one batch of data (in this case, 64 images). This is one forward pass. Note that our predictions won’t be any better than random at this stage, since we start with random weights.

bs = 64  # batch size

xb = x_train[0:bs]  # a mini-batch from x
preds = model(xb)  # predictions
preds[0], preds.shape
print(preds[0], preds.shape)

the preds tensor contains not only the tensor values, but also a gradient function. We’ll use this later to do backprop.

implement negative log-likelihood to use as the loss function

def nll(input, target):
    return -input[range(target.shape[0]), target].mean()

loss_func = nll

check our loss with our random model, so we can see if we improve after a backprop pass later.

yb = y_train[0:bs]
print(loss_func(preds, yb))

Calculate the accuracy of our model. For each prediction, if the index with the largest value matches the target value, then the prediction was correct.

def accuracy(out, yb):
    preds = torch.argmax(out, dim=1)
    return (preds == yb).float().mean()

check the accuracy of our random model, so we can see if our accuracy improves as our loss improves.

print(accuracy(preds, yb))

run a training loop. For each iteration:

  • select a mini-batch of data (of size bs)
  • use the model to make predictions
  • calculate the loss
  • loss.backward() updates the gradients of the model, in this case, weights and bias

use these gradients to update the weights and bias. We do this within the torch.no_grad() context manager, because we do not want these actions to be recorded for our next calculation of the gradient. You can read more about how PyTorch's Autograd records operations here

set the gradients to zero, so that we are ready for the next loop. Otherwise, our gradients would record a running tally of all the operations that had happened (i.e. loss.backward() adds the gradients to whatever is already stored, rather than replacing them).

You can use the standard python debugger to step through PyTorch code, allowing you to check the various variable values at each step. Uncomment set_trace() below to try it out.

from IPython.core.debugger import set_trace

lr = 0.5  # learning rate
epochs = 2  # how many epochs to train for

for epoch in range(epochs):
    for i in range((n - 1) // bs + 1):
        #         set_trace()
        start_i = i * bs
        end_i = start_i + bs
        xb = x_train[start_i:end_i]
        yb = y_train[start_i:end_i]
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        with torch.no_grad():
            weights -= weights.grad * lr
            bias -= bias.grad * lr
            weights.grad.zero_()
            bias.grad.zero_()

That’s it: we’ve created and trained a minimal neural network (in this case, a logistic regression, since we have no hidden layers) entirely from scratch!

Let’s check the loss and accuracy and compare those to what we got earlier. We expect that the loss will have decreased and accuracy to have increased.

print(loss_func(model(xb), yb), accuracy(model(xb), yb))

Using torch.nn.functional

Refactor our code, so that it does the same thing as before, only we'll start taking advantage of PyTorch's nn classes to make it more concise and flexible. At each step from here, we should be making our code one or more of: shorter, more understandable, and/or more flexible.

make our code shorter by replacing our hand-written activation and loss functions with those from torch.nn.functional (which is generally imported into the namespace F by convention). This module contains all the functions in the torch.nnlibrary (whereas other parts of the library contain classes). As well as a wide range of loss and activation functions, you'll also find here some convenient functions for creating neural nets, such as pooling functions. (There are also functions for doing convolutions, linear layers, etc, but as we'll see, these are usually better handled using other parts of the library.)

Pytorch provides a single function F.cross_entropy that combines negative log likelihood loss and log softmax activation

import torch.nn.functional as F

loss_func = F.cross_entropy

def model(xb):
    return xb @ weights + bias

No longer call log_softmax in the model function. Let's confirm that our loss and accuracy are the same as before:

print(loss_func(model(xb), yb), accuracy(model(xb), yb))

Refactor using nn.Module

use nn.Module and nn.Parameter, for a clearer and more concise training loop. subclass nn.Module (which itself is a class and able to keep track of state). In this case, we want to create a class that holds our weights, bias, and method for the forward step. nn.Module has a number of attributes and methods (such as .parameters() and .zero_grad()) which we will be using.

from torch import nn

class Mnist_Logistic(nn.Module):
    def __init__(self):
        super().__init__()
        self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))
        self.bias = nn.Parameter(torch.zeros(10))

    def forward(self, xb):
        return xb @ self.weights + self.bias

Now using an object instead of just using a function, we first have to instantiate our model:

model = Mnist_Logistic()

calculate the loss in the same way as before. Note that nn.Module objects are used as if they are functions (i.e they are callable), but behind the scenes Pytorch will call our forward method automatically.

print(loss_func(model(xb), yb))

Previously for our training loop we had to update the values for each parameter by name, and manually zero out the grads for each parameter separately. Now we can take advantage of model.parameters() and model.zero_grad() (which are both defined by PyTorch for nn.Module) to make those steps more concise and less prone to the error of forgetting some of our parameters, particularly if we had a more complicated model:

with torch.no_grad():
    for p in model.parameters(): p -= p.grad * lr
    model.zero_grad()

We’ll wrap our little training loop in a fit function so we can run it again later.

def fit():
    for epoch in range(epochs):
        for i in range((n - 1) // bs + 1):
            start_i = i * bs # bs is batch size
            end_i = start_i + bs
            xb = x_train[start_i:end_i]
            yb = y_train[start_i:end_i]
            pred = model(xb)
            loss = loss_func(pred, yb)

            loss.backward()
            with torch.no_grad():
                for p in model.parameters():
                    p -= p.grad * lr
                model.zero_grad()

fit()

Double-check that our loss has gone down:

print(loss_func(model(xb), yb))

Refactor using nn.Linear

We continue to refactor our code. Instead of manually defining and initializing self.weights and self.bias, and calculating xb @ self.weights + self.bias, we will instead use the Pytorch class nn.Linear for a linear layer, which does all that for us. Pytorch has many types of predefined layers that can greatly simplify our code, and often makes it faster too.

class Mnist_Logistic(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = nn.Linear(784, 10)

    def forward(self, xb):
        return self.lin(xb)

We instantiate our model and calculate the loss in the same way as before:

model = Mnist_Logistic()
print(loss_func(model(xb), yb))

still able to use our same fit method as before.

fit()

print(loss_func(model(xb), yb))

Refactor using optim

Pytorch also has a package with various optimization algorithms, torch.optim. We can use the step method from our optimizer to take a forward step, instead of manually updating each parameter.

This will let us replace our previous manually coded optimization step and instead use just:

opt.step()
opt.zero_grad()

optim.zero_grad() resets the gradient to 0 and we need to call it before computing the gradient for the next minibatch.

from torch import optim

define a little function to create our model and optimizer so we can reuse it in the future

def get_model():
    model = Mnist_Logistic()
    return model, optim.SGD(model.parameters(), lr=lr) # lr is learning rate

model, opt = get_model()
print(loss_func(model(xb), yb))

for epoch in range(epochs):
    for i in range((n - 1) // bs + 1):
        start_i = i * bs
        end_i = start_i + bs
        xb = x_train[start_i:end_i]
        yb = y_train[start_i:end_i]
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        opt.step()
        opt.zero_grad()

print(loss_func(model(xb), yb))

Refactor using Dataset

PyTorch has an abstract Dataset class. A Dataset can be anything that has a __len__ function (called by Python’s standard lenfunction) and a __getitem__ function as a way of indexing into it. This tutorial walks through a nice example of creating a custom FacialLandmarkDataset class as a subclass of Dataset.

PyTorch’s TensorDataset is a Dataset wrapping tensors. By defining a length and way of indexing, this also gives us a way to iterate, index, and slice along the first dimension of a tensor. This will make it easier to access both the independent and dependent variables in the same line as we train.

from torch.utils.data import TensorDataset

Both x_train and y_train can be combined in a single TensorDataset, which will be easier to iterate over and slice.

train_ds = TensorDataset(x_train, y_train)

Previously, we had to iterate through minibatches of x and y values separately. Now, we can do these two steps together:

xb,yb = train_ds[i*bs : i*bs+bs]

model, opt = get_model()

for epoch in range(epochs):
    for i in range((n - 1) // bs + 1):
        xb, yb = train_ds[i * bs: i * bs + bs]
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        opt.step()
        opt.zero_grad()

print(loss_func(model(xb), yb))

Refactor using DataLoader

Pytorch’s DataLoader is responsible for managing batches. You can create a DataLoader from any Dataset. DataLoader makes it easier to iterate over batches. Rather than having to use train_ds[i*bs : i*bs+bs], the DataLoader gives us each minibatch automatically.

from torch.utils.data import DataLoader

train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs)

Now, our loop is much cleaner, as (xb, yb) are loaded automatically from the data loader:

for xb,yb in train_dl:
    pred = model(xb)

model, opt = get_model()

for epoch in range(epochs):
    for xb, yb in train_dl:
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        opt.step()
        opt.zero_grad()

print(loss_func(model(xb), yb))

Thanks to Pytorch’s nn.Module, nn.Parameter, Dataset, and DataLoader, our training loop is now dramatically smaller and easier to understand. Let’s now try to add the basic features necessary to create effecive models in practice.

Add validation

In section 1, we were just trying to get a reasonable training loop set up for use on our training data. In reality, you always should also have a validation set, in order to identify if you are overfitting.

Shuffling the training data is important to prevent correlation between batches and overfitting. On the other hand, the validation loss will be identical whether we shuffle the validation set or not. Since shuffling takes extra time, it makes no sense to shuffle the validation data.

We’ll use a batch size for the validation set that is twice as large as that for the training set. This is because the validation set does not need backpropagation and thus takes less memory (it doesn’t need to store the gradients). We take advantage of this to use a larger batch size and compute the loss more quickly.

train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)

valid_ds = TensorDataset(x_valid, y_valid)
valid_dl = DataLoader(valid_ds, batch_size=bs * 2)

calculate and print the validation loss at the end of each epoch.

(Note that we always call model.train() before training, and model.eval() before inference, because these are used by layers such as nn.BatchNorm2d and nn.Dropout to ensure appropriate behaviour for these different phases.)

model, opt = get_model()

for epoch in range(epochs):
    model.train()
    for xb, yb in train_dl:
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        opt.step()
        opt.zero_grad()

    model.eval()
    with torch.no_grad():
        valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)

    print(epoch, valid_loss / len(valid_dl))

Create fit() and get_data()

Since we go through a similar process twice of calculating the loss for both the training set and the validation set, let’s make that into its own function, loss_batch, which computes the loss for one batch.

pass an optimizer in for the training set, and use it to perform backprop. For the validation set, we don’t pass an optimizer, so the method doesn’t perform backprop.

def loss_batch(model, loss_func, xb, yb, opt=None):
    loss = loss_func(model(xb), yb)

    if opt is not None:
        loss.backward()
        opt.step()
        opt.zero_grad()

    return loss.item(), len(xb)

fit runs the necessary operations to train our model and compute the training and validation losses for each epoch.

import numpy as np

def fit(epochs, model, loss_func, opt, train_dl, valid_dl):
    for epoch in range(epochs):
        model.train()
        for xb, yb in train_dl:
            loss_batch(model, loss_func, xb, yb, opt)

        model.eval()
        with torch.no_grad():
            losses, nums = zip(
                *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]
            )
        val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)

        print(epoch, val_loss)

get_data returns dataloaders for the training and validation sets.

def get_data(train_ds, valid_ds, bs):
    return (
        DataLoader(train_ds, batch_size=bs, shuffle=True),
        DataLoader(valid_ds, batch_size=bs * 2),
    )

Now our whole process of obtaining the data loaders and fitting the model can be run in 3 lines of code:

train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
model, opt = get_model()
fit(epochs, model, loss_func, opt, train_dl, valid_dl)

You can use these basic 3 lines of code to train a wide variety of models. Let’s see if we can use them to train a convolutional neural network (CNN)!

Switch to CNN

We are now going to build our neural network with three convolutional layers. Because none of the functions in the previous section assume anything about the model form, we’ll be able to use them to train a CNN without any modification.

We will use Pytorch’s predefined Conv2d class as our convolutional layer. We define a CNN with 3 convolutional layers. Each convolution is followed by a ReLU. At the end, we perform an average pooling. (Note that view is PyTorch’s version of numpy’s reshape)

class Mnist_CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)
        self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)

    def forward(self, xb):
        xb = xb.view(-1, 1, 28, 28)
        xb = F.relu(self.conv1(xb))
        xb = F.relu(self.conv2(xb))
        xb = F.relu(self.conv3(xb))
        xb = F.avg_pool2d(xb, 4)
        return xb.view(-1, xb.size(1))

lr = 0.1

Momentum is a variation on stochastic gradient descent that takes previous updates into account as well and generally leads to faster training.

model = Mnist_CNN()
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)

fit(epochs, model, loss_func, opt, train_dl, valid_dl)

nn.Sequential

torch.nn has another handy class we can use to simply our code: Sequential . A Sequential object runs each of the modules contained within it, in a sequential manner. This is a simpler way of writing our neural network.

To take advantage of this, we need to be able to easily define a custom layer from a given function. For instance, PyTorch doesn’t have a view layer, and we need to create one for our network. Lambda will create a layer that we can then use when defining a network with Sequential.

class Lambda(nn.Module):
    def __init__(self, func):
        super().__init__()
        self.func = func

    def forward(self, x):
        return self.func(x)


def preprocess(x):
    return x.view(-1, 1, 28, 28)

The model created with Sequential is:

model = nn.Sequential(
    Lambda(preprocess),
    nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
    nn.ReLU(),
    nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),
    nn.ReLU(),
    nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),
    nn.ReLU(),
    nn.AvgPool2d(4),
    Lambda(lambda x: x.view(x.size(0), -1)),
)

opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)

fit(epochs, model, loss_func, opt, train_dl, valid_dl)

Wrapping DataLoader

Our CNN is fairly concise, but it only works with MNIST, because:

  • It assumes the input is a 28*28 long vector
  • It assumes that the final CNN grid size is 4*4 (since that’s the average pooling kernel size we used)

Let’s get rid of these two assumptions, so our model works with any 2d single channel image. First, we can remove the initial Lambda layer but moving the data preprocessing into a generator:

def preprocess(x, y):
    return x.view(-1, 1, 28, 28), y


class WrappedDataLoader:
    def __init__(self, dl, func):
        self.dl = dl
        self.func = func

    def __len__(self):
        return len(self.dl)

    def __iter__(self):
        batches = iter(self.dl)
        for b in batches:
            yield (self.func(*b))

train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
train_dl = WrappedDataLoader(train_dl, preprocess)
valid_dl = WrappedDataLoader(valid_dl, preprocess)

replace nn.AvgPool2d with nn.AdaptiveAvgPool2d, which allows us to define the size of the output tensor we want, rather than the input tensor we have. As a result, our model will work with any size input.

model = nn.Sequential(
    nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
    nn.ReLU(),
    nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),
    nn.ReLU(),
    nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),
    nn.ReLU(),
    nn.AdaptiveAvgPool2d(1),
    Lambda(lambda x: x.view(x.size(0), -1)),
)

opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)

fit(epochs, model, loss_func, opt, train_dl, valid_dl)

Using your GPU

check that your GPU is working in Pytorch:

print(torch.cuda.is_available())

create a device object for it:

dev = torch.device(
    "cuda") if torch.cuda.is_available() else torch.device("cpu")

Update preprocess to move batches to the GPU:

def preprocess(x, y):
    return x.view(-1, 1, 28, 28).to(dev), y.to(dev)


train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
train_dl = WrappedDataLoader(train_dl, preprocess)
valid_dl = WrappedDataLoader(valid_dl, preprocess)

move our model to the GPU

model.to(dev)
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)

It runs faster now:

fit(epochs, model, loss_func, opt, train_dl, valid_dl)

Summary

  • torch.nn:
    • Module: creates a callable which behaves like a function, but can also contain state(such as neural net layer weights). It knows what Parameter(s) it contains and can zero all their gradients, loop through them for weight updates, etc.
    • Parameter: a wrapper for a tensor that tells a Module that it has weights that need updating during backprop. Only tensors with the requires_gradattribute set are updated
    • functional: a module(usually imported into the F namespace by convention) which contains activation functions, loss functions, etc, as well as non-stateful versions of layers such as convolutional and linear layers.
  • torch.optim: Contains optimizers such as SGD, which update the weights of Parameter during the backward step
  • Dataset: An abstract interface of objects with a __len__ and a __getitem__, including classes provided with Pytorch such as TensorDataset
  • DataLoader: Takes any Dataset and creates an iterator which returns batches of data.

Lesson 0: Jupyter notebook

Get lessons from https://github.com/fastai/course-v3

Course taught by Jeremy Howard GitHub

Make deep learning accessible: software, education, research, community

Some back ground on why jupyter notebook https://paulromer.net/jupyter-mathematica-and-the-future-of-the-research-paper/

The possibility of Jupyter notebooks being used for papers https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/

Esc to enter edit mode and Enter to enter command mode

Your notebook is autosaved every 120 seconds

Esc -> s to save

Esc -> up arrow to toggle cells

Esc -> b to create a new cell

Esc -> 0 -> 0 to restart kernel

Esc -> m to convert a cell to markdown

Esc -> y to convert a cell to code

Esc -> d -> d to delete a cell

Esc -> o to hide output

?function-name: Shows the definition and docstring for that function

??function-name: Shows the source code for that function

doc(function-name): Shows the definition, docstring and links to the documentation of the function (only works with fastai library imported)

Use Tab to autocomplete method and Shift + Tab to show the input to that method.

Lesson 1: Image classification

https://course.fast.ai/videos/?lesson=1

Build our first image classifier from scratch.

The fastai library provides many useful functions that enable us to quickly and easily build neural networks and train our models.

Argue to use import * for testing.

from fastai.vision import *
from fastai.metrics import error_rate

Known datasets. Academic datasets are often in good shape and used frequently. Provide baselines so you know if your model is good.

The dataset is Oxford-IIIT Pet Dataset by Parkhi et al 2012 which features 12 cat breeds and 25 dog breeds (fine grain classification). The best accuracy they got in their dataset was 59.21%.

Download and extract the data.

Use untar_data which requires Union (meaning other) pathlib.Path or str.

path = untar_data(URLs.PETS)

path.ls()
path_anno = path/'annotations' # can use / as path object
path_img = path/'images'

fnames = get_image_files(path_img)

pat = r'/([^/]+)_\d+.jpg$' # regular expression patterns

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats) # specify size of images to scale to using get_transforms()

data.show_batch(rows=3, figsize=(7,6)) # grab middle bit and resize it

# Get unique classes
print(data.classes)
len(data.classes),data.c

Training: resnet34

Model trained using a Learner. Sub class such as cnn_learner

Use a convolutional neural network backbone and a fully connected head with a single hidden layer as a classifier. Use resnet34. Trained of 1,000 classes on 1.5m images (ImageNet). Download pre-trained weights (transfer learning). Much quicker to train and don't need a big dataset.

learn = cnn_learner(data, models.resnet34, metrics=error_rate) # Requires ImageDataBunch and model architecture. resnet work pretty there's 34 and 50 (start smaller: 34). Print out error_rate.

learn.model

learn.fit_one_cycle(4) # 4 epochs on last few layers

learn.save('stage-1')

You can also do resnet50 but don't forget to reduce the batch size.

https://dawn.cs.stanford.edu/benchmark/#imagenet - shows scores of image classification models.

To see what comes out of the model:

learn knows data and model

interp = ClassificationInterpretation.from_learner(learn)

losses,idxs = interp.top_losses()

interp.plot_top_losses(9, figsize=(15,11))

plot 4 things: prediction, actual, loss, and probability of actual class

Confusion matrix: for actual class how many times was it predicted to be that class

interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

If you have lots of classes best to use

interp.most_confused(min_val=2)

unfreeze our model and train the whole model

Zeiler and Furgus 2013 understand and visualizing cnn's.

We want to change the last layers which are specific features

learn.unfreeze()
learn.fit_one_cycle(1)

Leads to worse error as it is updating all layers e.g. diagonals.

learn.lr_find()
learn.recorder.plot()

Default lr is 0.003 which is were the loss increases a lot

learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4)) # this is dynamic for each layer between these two values

This gives a 10% increase in accuracy that before. You only really need these two layers.

Training: resnet50

resnet paper: https://arxiv.org/abs/1311.2901

bs = 16
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=299, bs=bs//2).normalize(imagenet_stats)

learn = cnn_learner(data, models.resnet50, metrics=error_rate)

learn.lr_find()
learn.recorder.plot()

learn.fit_one_cycle(8)

learn.save('stage-1-50')

learn.unfreeze()
learn.fit_one_cycle(3, max_lr=slice(1e-6,1e-4))

learn.load('stage-1-50');

interp = ClassificationInterpretation.from_learner(learn)

interp.most_confused(min_val=2)

Other data formats

path = untar_data(URLs.MNIST_SAMPLE); path
tfms = get_transforms(do_flip=False)

The labels are what the folders are called so you can use from_folder

data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=26)
data.show_batch(rows=3, figsize=(5,5))

learn = cnn_learner(data, models.resnet18, metrics=accuracy)
learn.fit(2)

df = pd.read_csv(path/'labels.csv')
df.head()

if label is in csv file can use:

data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=28)

from df

data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
data.classes

grab label using regex

pat = r"/(\d)/\d+\.png$"
data = ImageDataBunch.from_name_re(path, fn_paths, pat=pat, ds_tfms=tfms, size=24)
data.classes

can write your own function

data = ImageDataBunch.from_name_func(path, fn_paths, ds_tfms=tfms, size=24,
        label_func = lambda x: '3' if '/3/' in str(x) else '7')
data.classes

from lists

labels = [('3' if '/3/' in str(x) else '7') for x in fn_paths]
labels[:5]
data = ImageDataBunch.from_lists(path, fn_paths, labels=labels, ds_tfms=tfms, size=24)
data.classes

How to download your own image dataset

https://forums.fast.ai/t/tips-for-building-large-image-datasets/26688/3

https://lpdaacsvc.cr.usgs.gov/appeears/ - to get MERIS, ESA Sentinel-3

https://github.com/prairie-guy/ai_utilities

https://github.com/svenski/duckgoose

https://forums.fast.ai/t/generating-image-datasets-quickly/19079/9

https://forums.fast.ai/t/how-to-scrape-the-web-for-images/7446/3

Lesson 2a: Creating your own dataset from Google Images

Inspired by https://www.pyimagesearch.com/2017/12/04/how-to-create-a-deep-learning-dataset-using-google-images/

from fastai.vision import *

Create folders for the files

path = Path('data/bears')
folder = 'black'
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)

folder = 'teddys'
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)

folder = 'grizzly'
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)

Go to Google Images and type your search time. If you want to exclude anything you can do "canis lupus lupus" -dog to search for wolves for example. Tools -> Type -> Photo to only get photos. Type "black bear"

Run some Javascript code in your browser which will save the URLs of all the images you want for you dataset.

Turn off ad-block. and press Ctrl + Shift + J in your browser and run (press Enter to run)

urls = Array.from(document.querySelectorAll('.rg_di .rg_meta')).map(el=>JSON.parse(el.textContent).ou);
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));

Move the 'download' file into the bear folder and name the file urls_black.csv

Do the same for teddys and grizzly

Download the files

path = Path('data/bears')

file = 'urls_black.csv'
folder = 'black'
dest = path/folder
download_images(path/file, dest, max_pics=200)

file = 'urls_teddys.csv'
folder = 'teddys'
dest = path/folder
download_images(path/file, dest, max_pics=200)

file = 'urls_grizzly.csv'
folder = 'grizzly'
dest = path/folder
download_images(path/file, dest, max_pics=200)

Then clean up files that can't be opened:

classes = ['teddys','grizzly','black']
for c in classes:
    print(c)
    verify_images(path/c, delete=True, max_size=500)

May have to go through this a couple of times to get all the images.

View the data:

np.random.seed(42)
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
        ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
data.classes
data.show_batch(rows=3, figsize=(7,8))

Train the model:

learn = cnn_learner(data, models.resnet34, metrics=error_rate)
learn.fit_one_cycle(4)
learn.save('stage-1')
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(2, max_lr=slice(3e-5,3e-4))
learn.save('stage-2')

Interpret the model:

learn.load('stage-2');
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

Cleaning up the data:

Using the ImageCleaner widget from fastai.widgets we can prune our top losses, removing photos that don't belong.

from fastai.widgets import *

get the file paths from our top_losses. Use .from_toplosses. Feed the top losses indexes and corresponding dataset to ImageCleaner

The widget will not delete images directly from disk but it will create a new csv file cleaned.csv from where you can create a new ImageDataBunch with the corrected labels to continue training your model.

https://ipywidgets.readthedocs.io/en/latest/

create a new dataset without the split.

db = (ImageList.from_folder(path)
                   .split_none()
                   .label_from_folder()
                   .transform(get_transforms(), size=224)
                   .databunch()
     )

Create a new learner to use our new databunch with all the images.

learn_cln = cnn_learner(db, models.resnet34, metrics=error_rate)
learn_cln.load('stage-2');
ds, idxs = DatasetFormatter().from_toplosses(learn_cln)
ImageCleaner(ds, idxs, path)

Flag photos for deletion by clicking 'Delete'

Find duplicates in your dataset and delete them! run .from_similars to get the potential duplicates' ids and then run ImageCleaner with duplicates=True

db = (ImageList.from_csv(path, 'cleaned.csv', folder='.')
                    .split_none()
                    .label_from_df()
                    .transform(get_transforms(), size=224)
                    .databunch()
      )
learn_cln = cnn_learner(db, models.resnet34, metrics=error_rate)
learn.load('stage-2');

ds, idxs = DatasetFormatter().from_similars(learn_cln)

ImageCleaner(ds, idxs, path, duplicates=True)

db = (ImageList.from_csv(path, 'cleaned.csv', folder='.')
                    .split_none()
                    .label_from_df()
                    .transform(get_transforms(), size=224)
                    .databunch()
      )

learn = cnn_learner(db, models.resnet34, metrics=error_rate)

Put the model into production

Export the content of our Learner object for production

learn.export()

This will create a file named 'export.pkl' in the directory where we were working that contains everything we need to deploy our model (the model, the weights but also some metadata like the classes or the transforms/normalization used).

Putting the model in production

It is better to use a CPU than a GPU for scaling and because unlike a GPU it won't have to wait to build up a batch up 64 images before running hence making users wait.

Test your model on CPU like so:

defaults.device = torch.device('cpu')
img = open_image(path/'black'/'00000021.jpg')
img

learn = load_learner(path)

pred_class,pred_idx,outputs = learn.predict(img)
pred_class

Starlette app

https://www.starlette.io/

$ pip install starlette
$ pip install uvicorn

Create a file called hello_world.py

from starlette.applications import Starlette
from starlette.responses import JSONResponse
import uvicorn

app = Starlette(debug=True)

@app.route('/')
async def homepage(request):
    return JSONResponse({'hello': 'world'})

if __name__ == '__main__':
    uvicorn.run(app, host='0.0.0.0', port=8000)

$ git clone https://github.com/encode/starlette-example.git
$ cd starlette-example
$ pip install aiofiles
$ scripts/run

https://github.com/simonw/cougar-or-not/blob/master/cougar.py

Parameters

You can play with learning rate and number of batches

Learning rate too high:

learn = cnn_learner(data, models.resnet34, metrics=error_rate)
learn.fit_one_cycle(1, max_lr=0.5)
error = 0.69

Your validation loss will explode.

Learning rate too low:

learn = cnn_learner(data, models.resnet34, metrics=error_rate)
# error rate goes down then back up

learn.fit_one_cycle(5, max_lr=1e-5)
learn.recorder.plot_losses()

Train loss is higher than validation loss.

Too few epochs

learn = cnn_learner(data, models.resnet34, metrics=error_rate, pretrained=False)
learn.fit_one_cycle(1)

Training loss is much higher than validation loss.

Too many epochs

np.random.seed(42)
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.9, bs=32, 
        ds_tfms=get_transforms(do_flip=False, max_rotate=0, max_zoom=1, max_lighting=0, max_warp=0
                              ),size=224, num_workers=4).normalize(imagenet_stats)

learn = cnn_learner(data, models.resnet50, metrics=error_rate, ps=0, wd=0)
learn.unfreeze()
learn.fit_one_cycle(40, slice(1e-6,1e-4))

Overfitting. Error rate improves for a while than gets worse again. Model should have train loss lower than validation loss.

Metric on validation dataset.

https://forums.fast.ai/t/lesson-1-official-resources-and-updates/27936

Lesson 2b: Data cleaning and production; SGD from scratch

https://course.fast.ai/videos/?lesson=2

Make sure you have the latest version of the code and the latest version of the course

$ conda update conda
$ conda update anaconda
$ conda activate fastai
$ conda update --all

Compare https://github.com/fastai/course-v3 to my local course.

cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1
jupyter notebook

Cool fast.ai work

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification - https://arxiv.org/abs/1608.04363

https://forums.fast.ai/t/share-your-work-here/27676/38

Learning CNNs

https://medium.com/@ageitgey/machine-learning-is-fun-part-3-deep-learning-and-convolutional-neural-networks-f40359318721

http://matrixmultiplication.xyz/

Linear regression with SGD

y = a1x1 + a2x2

%matplotlib inline
from fastai.basics import *

n=100

# x1 is noisy and x2 is full of ones.
x = torch.ones(n, 2)
# _ means replace values (i.e. don't return).
x[:,0].uniform_(-1., 1)
x[:5]

# a1 = 3; a2 = 2
a = tensor(3., 2.); a

# x@a is a matrix product
y = x@a + torch.rand(n)

plt.scatter(x[:,0], y);

Try and work out a1 and a2.

find parameters (weights) a such that you minimize the error between the points and the line x@a. For a regression problem the most common error function or loss function is the mean squared error.

def mse(y_hat, y): return ((y_hat - y) ** 2).mean()

Start with a guess of -1, 1.

a = tensor(-1., 1.)
y_hat = x@a
mse(y_hat, y)
plt.scatter(x[:,0], y)
plt.scatter(x[:,0], y_hat);

Change intercept and gradient. Four possibilities...

Derivative tells you how to move the line of best fit.

Call .grad to get the derivative.

a = nn.Parameter(a); a
lr = 1e-1

def update(lr, a):
    y_hat = x@a
    loss = mse(y, y_hat)
    if t % 10 == 0:
        print(loss)
    loss.backward() # calculate gradient
    # Turn gradient calculation off when you do the SGD update
    with torch.no_grad():
        # substract the gradient from a
        # _ means inplace
        # subtract i.e. go in other 'direction' of loss.
        a.sub_(lr * a.grad) # the derivative gets assigned to .grad
        a.grad.zero_() # 0 out the gradient

for t in range(100):
    update(lr, a)

plt.scatter(x[:, 0], y)
plt.scatter(x[:, 0], x@a);

from matplotlib import animation, rc
rc('animation', html='jshtml')

a = nn.Parameter(tensor(-1., 1.))

fig = plt.figure()
plt.scatter(x[:, 0], y, c='orange')
line, = plt.plot(x[:, 0], x@a)
plt.close()

def animate(i):
    update(lr)
    line.set_ydata(x@a)
    return line,

animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)

In practice we use mini-batches.

a + bx (high bias - underfit)

a + bx + cx^2 (just right)

a + bx + cx^2 + dx^3 + ex^4 (high variance - overfit).

See also

You can deploy computer vision models at https://course.fast.ai/deployment_render.html

https://github.com/hiromis/notes/blob/master/Lesson2.md

https://www.youtube.com/watch?v=q6DGVGJ1WP4

https://responder.readthedocs.io/en/latest/

https://www.christianwerner.net/tech/Build-your-image-dataset-faster/

https://forums.fast.ai/t/tool-for-deleting-files-on-the-google-image-search-page-before-downloading/28900

https://medium.com/@ageitgey/machine-learning-is-fun-part-3-deep-learning-and-convolutional-neural-networks-f40359318721

A systematic study of the class imbalance problem in convolutional neural networks - https://arxiv.org/abs/1710.05381

https://www.youtube.com/watch?v=q6DGVGJ1WP4 - There's no such thing as not a math person.

https://www.fast.ai/2017/11/13/validation-sets/ how (and why) to create a good validation set.

Lesson 3: Data blocks; Multi-label classification; Segmentation

https://course.fast.ai/videos/?lesson=3

Make sure you have the latest version of the code and the latest version of the course

$ conda update conda
$ conda update anaconda
$ conda activate fastai
$ conda update --all

Compare https://github.com/fastai/course-v3 to my local course.

cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1
jupyter notebook

More info on deploying webapps here - https://course.fast.ai/deployment_render.html

Classifiers people made are in - https://github.com/hiromis/notes/blob/master/Lesson3.md

Multi-label prediction with Planet Amazon dataset

%reload_ext autoreload
%autoreload 2
%matplotlib inline

# https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-planet.ipynb
from fastai.vision import *

Install kaggle

https://www.kaggle.com/c/planet-understanding-the-amazon-from-space

path = Config.data_path()/'planet'
path.mkdir(parents=True, exist_ok=True)

Download data

$ kaggle competitions download -c planet-understanding-the-amazon-from-space -f train-jpg.tar.7z -p /home/ray/.fastai/data/planet
$ kaggle competitions download -c planet-understanding-the-amazon-from-space -f train_v2.csv -p /home/ray/.fastai/data/planet
$ unzip -q -n /home/ray/.fastai/data/planet/train_v2.csv.zip -d /home/ray/.fastai/data/planet

Install 7zip and uncompress

$ sudo apt-get update
$ sudo apt install p7zip-full
$ 7za -bd -y -so x /home/ray/.fastai/data/planet/train-jpg.tar.7z | tar xf - -C /home/ray/.fastai/data/planet

Multiple layers for each tile

df = pd.read_csv(path/'train_v2.csv')
df.head()

Put this into a DataBunch and use ImageList (and not ImageDataBunch). The Dataset class: https://pytorch.org/docs/stable/data.html#map-style-datasets

has __getitem__() e.g. object[3] and __len__() len(object) (dunder (double under)). This provides images and datasets

Create a mini-batch using a DataLoader(dataset). Use DataBunch to bind train DataLoader and valid DataLoader.

np.random.seed(42)
src = (ImageList.from_csv(path, 'train_v2.csv', folder='train-jpg', suffix='.jpg') # Get images
       .split_by_rand_pct(0.2)
       .label_from_df(label_delim=' ')) # Get labels

# Flip vert to true as satellite data. warp (look at from above or below) set to 0 as satellite is from top.
tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
data = (src.transform(tfms, size=128)
        .databunch().normalize(imagenet_stats))

data.show_batch(rows=3, figsize=(12,9))

Setup CNN and metrics (accuracy (argmax) and f-score). fbeta with beta=2 is equal to F2. Use threshold to keep classes that we think exists in a sample. Use partial function to call a function with the same keywords.

data.c
data.classes

arch = models.resnet50
acc_02 = partial(accuracy_thresh, thresh=0.2)
f_score = partial(fbeta, thresh=0.2)
learn = cnn_learner(data, arch, metrics=[acc_02, f_score])

learn.lr_find()
learn.recorder.plot()
lr = 0.01
learn.fit_one_cycle(5, slice(lr))

learn.save('stage-1-rn50')

Fine tune/fit a bit more. You could create a DataBunch with the wrong classified images and fit them (e.g. higher learning rate or more epochs).

learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(5, slice(1e-5, lr / 5))
learn.save('stage-2-rn50')

This model was fit to images with size 128. Use transfer learning to fit to 256.

data = (src.transform(tfms, size=256)
        .databunch(bs=32).normalize(imagenet_stats))

# Put new data into learn
learn.data = data
data.train_ds[0][0].shape

learn.freeze()

learn.lr_find()
learn.recorder.plot()

Train last two layers.

lr = 1e-2 / 2
learn.fit_one_cycle(5, slice(lr))

Image Segmentation with CamVid

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai.vision import *
from fastai.callbacks.hooks import *
from fastai.utils.mem import *

You can see the datasets at https://course.fast.ai/datasets

path = untar_data(URLs.CAMVID)
path.ls()

path_lbl = path/'labels'
path_img = path/'images'

fnames = get_image_files(path_img)
fnames[:3]

lbl_names = get_image_files(path_lbl)
lbl_names[:3]

img_f = fnames[0]
img = open_image(img_f)
img.show(figsize=(5, 5))

get_y_fn = lambda x: path_lbl/f'{x.stem}_P{x.suffix}'

mask = open_mask(get_y_fn(img_f))
mask.show(figsize=(5,5), alpha=1)

src_size = np.array(mask.shape[1:])
src_size,mask.data

codes = np.loadtxt(path/'codes.txt', dtype=str); codes

Datasets

size = src_size//2

free = gpu_mem_get_free_no_cache()
# the max size of bs depends on the available GPU RAM
if free > 8200: bs=8
else:           bs=4
print(f"using bs={bs}, have {free}MB of GPU RAM free")

src = (SegmentationItemList.from_folder(path_img)
       .split_by_fname_file('../valid.txt')
       .label_from_func(get_y_fn, classes=codes))

# Transform y (the independent variable as well)
data = (src.transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))

data.show_batch(2, figsize=(10,7))

data.show_batch(2, figsize=(10,7), ds_type=DatasetType.Valid)

Model

name2id = {v:k for k,v in enumerate(codes)}
# Label some pixel's void. Remove these.
void_code = name2id['Void']

# Custom metric
def acc_camvid(input, target):
    target = target.squeeze(1)
    mask = target != void_code
    return (input.argmax(dim=1)[mask]==target[mask]).float().mean()
metrics=acc_camvid
# metrics=accuracy
wd=1e-2

For segmentation model

https://docs.fast.ai/vision.learner.html#unet_learner

https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/

learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd)
lr_find(learn)
learn.recorder.plot()
lr=3e-3
learn.fit_one_cycle(10, slice(lr), pct_start=0.9)
learn.save('stage-1')
learn.load('stage-1');
learn.show_results(rows=3, figsize=(8,9))
learn.unfreeze()
lrs = slice(lr/400,lr/4)
learn.fit_one_cycle(12, lrs, pct_start=0.8)
learn.save('stage-2');

Go Big

learn.destroy()

size = src_size

free = gpu_mem_get_free_no_cache()
# the max size of bs depends on the available GPU RAM
if free > 8200: bs=3
else:           bs=1
print(f"using bs={bs}, have {free}MB of GPU RAM free")
data = (src.transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))
learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd)
learn.load('stage-2');
lr_find(learn)
learn.recorder.plot()
learn.recorder.plot_lr()

Increase LR at start and decrease at end.

lr=1e-3
learn.fit_one_cycle(10, slice(lr), pct_start=0.8)
learn.save('stage-1-big')
learn.load('stage-1-big');
learn.unfreeze()
lrs = slice(1e-6,lr/10)
learn.fit_one_cycle(10, lrs)
learn.save('stage-2-big')
learn.load('stage-2-big');
learn.show_results(rows=3, figsize=(10,10))

Results are in https://arxiv.org/abs/1611.09326 (The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation).

Regression with BIWI head pose dataset

Find the center of a face (x and y pixels). Regression model.

%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
path = untar_data(URLs.BIWI_HEAD_POSE)

Convert the data

cal = np.genfromtxt(path/'01'/'rgb.cal', skip_footer=6); cal
fname = '09/frame_00667_rgb.jpg'
def img2txt_name(f): return path/f'{str(f)[:-7]}pose.txt'
img = open_image(path/fname)
img.show()
ctr = np.genfromtxt(img2txt_name(fname), skip_header=3); ctr
def convert_biwi(coords):
    c1 = coords[0] * cal[0][0]/coords[2] + cal[0][2]
    c2 = coords[1] * cal[1][1]/coords[2] + cal[1][2]
    return tensor([c2,c1])

def get_ctr(f):
    ctr = np.genfromtxt(img2txt_name(f), skip_header=3)
    return convert_biwi(ctr)

def get_ip(img,pts): return ImagePoints(FlowField(img.size, pts), scale=True)
get_ctr(fname)
ctr = get_ctr(fname)
img.show(y=get_ip(img, ctr), figsize=(6, 6))

validation on a person, set of points. transform y = true.

data = (PointsItemList.from_folder(path)
        .split_by_valid_func(lambda o: o.parent.name=='13')
        .label_from_func(get_ctr)
        .transform(get_transforms(), tfm_y=True, size=(120,160))
        .databunch().normalize(imagenet_stats)
       )

Train model.

learn = cnn_learner(data, models.resnet34)
learn.lr_find()
learn.recorder.plot()
lr = 2e-2
learn.fit_one_cycle(5, slice(lr))
learn.save('stage-1')
learn.load('stage-1');
learn.show_results()

IMDB

from fastai.text import *
path = untar_data(URLs.IMDB_SAMPLE)
path.ls()

Create DataBunch

data_lm = TextDataBunch.from_csv(path, 'texts.csv')
data_lm.save()
data = load_data(path)

The steps that happen here are tokenization. Take the words and convert to token (e.g. word) lemmetize. Replace rare words with unknown. lower case. spaces

data = TextClasDataBunch.from_csv(path, 'texts.csv')
data.show_batch()

Replace text with number. Use vocab size of 60,000.

data.vocab.itos[:10]
data.train_ds[0][0]
data.train_ds[0][0].data[:10]

Use the data block API

data = (TextList.from_csv(path, 'texts.csv', cols='text')
                .split_from_df(col=2)
                .label_from_df(cols=0)
                .databunch())

Create a language model. Use a pre-trained model on https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/ to guess what the next word is. Use transfer learning. ~1 billion tokens. Self supervised learning.

bs=48
path = untar_data(URLs.IMDB)
path.ls()
(path/'train').ls()

# lm = 'Language model'
data_lm = (TextList.from_folder(path)
           #Inputs: all the text files in path
            .filter_by_folder(include=['train', 'test', 'unsup']) 
           #We may have other temp folders that contain text files so we only keep what's in train and test
            .split_by_rand_pct(0.1)
           #We randomly split and keep 10% (10,000 reviews) for validation
            .label_for_lm()           
           #We want to do a language model so we label accordingly
            .databunch(bs=bs))
data_lm.save('data_lm.pkl')

Ignore labels and shuffle data

data_lm = load_data(path, 'data_lm.pkl', bs=bs)
data_lm.show_batch()

# https://docs.fast.ai/text.models.html#AWD_LSTM
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)
learn.lr_find()
learn.recorder.plot(skip_end=15)

learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7)) # moms is momentum

learn.save('fit_head')
learn.load('fit_head');

Fine tune

learn.unfreeze()
learn.fit_one_cycle(10, 1e-3, moms=(0.8,0.7))
learn.save('fine_tuned')

Test model output

learn.load('fine_tuned');
TEXT = "I liked this movie because"
N_WORDS = 40
N_SENTENCES = 2
print("\n".join(learn.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)))

learn.save_encoder('fine_tuned_enc')

Create classifier to predict review

path = untar_data(URLs.IMDB)

data_clas = (TextList.from_folder(path, vocab=data_lm.vocab)
             #grab all the text files in path
             .split_by_folder(valid='test')
             #split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)
             .label_from_folder(classes=['neg', 'pos'])
             #label them all with their folders
             .databunch(bs=bs))

data_clas.save('data_clas.pkl')
data_clas = load_data(path, 'data_clas.pkl', bs=bs)
data_clas.show_batch()

Create a model to classify those reviews and load the encoder we saved before

learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
learn.lr_find()
learn.recorder.plot()

learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7))
learn.save('first')
learn.load('first');
learn.freeze_to(-2) # unfreeze last two layers.

learn.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2), moms=(0.8,0.7))
# two values for slice and how quickly the lowest and highest layers learn.
# 2.6 comes from doing a RF on hyperparameters to predict accuracy
learn.save('second')
learn.load('second');
learn.freeze_to(-3) # unfreeze last three

learn.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3), moms=(0.8,0.7))
learn.save('third')
learn.load('third');
learn.unfreeze() # unfreeze the whole thing

learn.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3), moms=(0.8,0.7))
learn.predict("I really loved that movie, it was awesome!")

See also

https://www.kaggle.com/c/planet-understanding-the-amazon-from-space

https://docs.fast.ai/data_block.html

https://blog.usejournal.com/finding-data-block-nirvana-a-journey-through-the-fastai-data-block-api-c38210537fe4

https://stackoverflow.com/questions/29133085/what-are-keypoints-in-image-processing

https://forums.fast.ai/t/deep-learning-lesson-3-notes/29829

https://github.com/hiromis/notes/blob/master/Lesson3.md

https://forums.fast.ai/t/lesson-3-in-class-discussion/29733

https://forums.fast.ai/t/lesson-3-links-to-different-parts-in-video/30077

http://course18.fast.ai/ml

https://www.coursera.org/learn/machine-learning

https://course.fast.ai/deployment_render.html

https://mmiakashs.github.io/blog/2018-09-20-kaggle-api-google-colab/

https://docs.python.org/3/library/functools.html#functools.partial

https://zulko.github.io/moviepy/

https://www.meetup.com/sfmachinelearning/events/255566613/

https://docs.fast.ai/vision.transform.html#List-of-transforms

https://arxiv.org/abs/1506.01186 - Cyclical Learning Rates for Training Neural Networks

http://nlp.fast.ai/category/classification.html

http://neuralnetworksanddeeplearning.com/

Lesson 4: NLP; Tabular data; Collaborative filtering; Embeddings

https://course.fast.ai/videos/?lesson=4

Make sure you have the latest version of the code and the latest version of the course

$ conda update conda
$ conda update anaconda
$ conda activate fastai
$ conda update --all

Compare https://github.com/fastai/course-v3 to my local course.

cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1
jupyter notebook

Basic steps are:

  1. Create (or, preferred, download a pre-trained) language model trained on a large corpus such as Wikipedia (a "language model" is any model that learns to predict the next word of a sentence)
  2. Fine-tune this language model using your target corpus (in this case, IMDb movie reviews)
  3. Extract the encoder from this fine tuned language model, and pair it with a classifier. Then fine-tune this model for the final classification task (in this case, sentiment analysis).

Tabular

Use NN instead of GBT/RF as less feature engineering.

https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson4-tabular.ipynb

from fastai.tabular import *

Download the adult dataset https://archive.ics.uci.edu/ml/datasets/adult

path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')

dep_var = 'salary'
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [FillMissing, Categorify, Normalize]

test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names, cont_names=cont_names)

data = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)
                           .split_by_idx(list(range(800,1000)))
                           .label_from_df(cols=dep_var)
                           .add_test(test)
                           .databunch())

https://github.com/fastai/fastai/blob/c498a576214edc9f7d91e16ef51988f26327e04e/fastai/tabular/models.py#L6

learn = tabular_learner(data, layers=[200, 100], metrics=accuracy)

learn.lr_find()
learn.recorder.plot()

learn.fit(1, 1e-2)

Make prediction

row = df.iloc[0]
learn.predict(row)

Collab filtering

You can either have the data as a table e.g. User | movie | number of stars.

Or sparse matrix Users x Movies.

https://grouplens.org/datasets/movielens/

There in an up to date version of the movie lens dataset (https://grouplens.org/datasets/movielens/)

https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson4-collab.ipynb

from fastai.collab import *
from fastai.tabular import *
user,item,title = 'userId','movieId','title'
path = untar_data(URLs.ML_SAMPLE)
ratings = pd.read_csv(path/'ratings.csv')

train a model

data = CollabDataBunch.from_df(ratings, seed=42)
y_range = [0, 5.5] # Range of scores

https://github.com/fastai/fastai/blob/master/fastai/collab.py#L96

https://github.com/fastai/fastai/blob/c498a576214edc9f7d91e16ef51988f26327e04e/fastai/collab.py#L36

https://pytorch.org/docs/stable/_modules/torch/nn/modules/sparse.html#Embedding

learn = collab_learner(data, n_factors=50, y_range=y_range)
learn.fit_one_cycle(3, 5e-3)

Movielens 100k

path=Config.data_path()/'ml-100k'
ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
                      names=[user,item,'rating','timestamp'])
ratings.head()
movies = pd.read_csv(path/'u.item',  delimiter='|', encoding='latin-1', header=None,
                    names=[item, 'title', 'date', 'N', 'url', *[f'g{i}' for i in range(19)]])

movies.head()
len(ratings)
rating_movie = ratings.merge(movies[[item, title]])
rating_movie.head()

data = CollabDataBunch.from_df(rating_movie, seed=42, valid_pct=0.1, item_name=title)

data.show_batch()
y_range = [0, 5.5] # input to sigmoid (rating is 0.5, 5) add a bit to make it larger.
# factor is width of embedding size.
# wd is weight decay (regularization). Sum square of parameters (some are + and some are -) and multiple it by wd.
learn = collab_learner(data, n_factors=40, y_range=y_range, wd=1e-1)
learn.lr_find()
learn.recorder.plot(skip_end=15)
learn.fit_one_cycle(5, 5e-3)
learn.save('dotprod') 

Interpret

learn.load('dotprod');
learn.model
g = rating_movie.groupby(title)['rating'].count()
top_movies = g.sort_values(ascending=False).index.values[:1000]
top_movies[:10]

Movie bias

movie_bias = learn.bias(top_movies, is_item=True) # is_item=True gives movie and is_item=False gives movies
movie_bias.shape

mean_ratings = rating_movie.groupby(title)['rating'].mean()
movie_ratings = [(b, i, mean_ratings.loc[i]) for i,b in zip(top_movies,movie_bias)]

item0 = lambda o:o[0]

sorted(movie_ratings, key=item0)[:15]

sorted(movie_ratings, key=item0, reverse=True)[:15]

Movie weights

movie_w = learn.weight(top_movies, is_item=True) 
movie_w.shape

# squish those 40 factors into 4
movie_pca = movie_w.pca(3)
movie_pca.shape

fac0,fac1,fac2 = movie_pca.t()
movie_comp = [(f, i) for f,i in zip(fac0, top_movies)]

# Some aspect of taste and movie feature
sorted(movie_comp, key=itemgetter(0), reverse=True)[:10]

sorted(movie_comp, key=itemgetter(0))[:10]

movie_comp = [(f, i) for f,i in zip(fac1, top_movies)]

sorted(movie_comp, key=itemgetter(0), reverse=True)[:10]

sorted(movie_comp, key=itemgetter(0))[:10]

# PCA plot
idxs = np.random.choice(len(top_movies), 50, replace=False)
idxs = list(range(50))
X = fac0[idxs]
Y = fac2[idxs]
plt.figure(figsize=(15,15))
plt.scatter(X, Y)
for i, x, y in zip(top_movies[idxs], X, Y):
    plt.text(x,y,i, color=np.random.rand(3)*0.7, fontsize=11)
plt.show()

Cold start problem - new user and new movie. Need a meta data (e.g. age and sex) model for new users and new models.

Predict user 1 will like movie 1.

Create 5 random number for each movie and 5 random numbers for each user. Then do a dot product get a value. Then update these values to get the matrix of movie ratings. Do RMSE of the matrix.

Embedding - matrix of weights. Bias - how much a user likes movies in general for e.g.

Use sigmoid to restrict output to 0 and 5.

Here is some benchmarking for movie-lens - https://www.librec.net/release/v1.3/example.html

See also

https://www.nytimes.com/2018/11/18/technology/artificial-intelligence-language.html (ULMFiT).

https://forums.fast.ai/t/deep-learning-lesson-4-notes/30983

https://forums.fast.ai/t/lesson-4-in-class-discussion/30318

https://forums.fast.ai/t/lesson-4-advanced-discussion/30319

https://www.youtube.com/watch?v=25nC0n9ERq4

Lesson 5: Back propagation; Accelerated SGD; Neural net from scratch

https://course.fast.ai/videos/?lesson=5

Make sure you have the latest version of the code and the latest version of the course

$ conda update conda
$ conda update anaconda
$ conda activate fastai
$ conda update --all

Compare https://github.com/fastai/course-v3 to my local course.

cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1
jupyter notebook

https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson5-sgd-mnist.ipynb

Last layer is removed e.g. resnet as don't have 1,000 objects. Also last layers is trained on very specific things. Freeze earlier layers as these recognize basic patterns.

Give different parts of the model different learning rates. Small learning rate for earlier objects. Discriminate learning rates. For fit can use 1e-3 - every layer gets the same lr. slice(1e-3) where final layer gets lr of 1e-3 and earlier layers get 1e-3 / 3. slice(1e-5, 1e-3) first layer gets 1e-5 and last layer gets 1e-3 then groups get lr in between those values.

Affine function (http://mathworld.wolfram.com/AffineFunction.html)

Embedding - look something up in an array. Fast and efficient way of multiplying with OHE.

Works e.g. movie has John Travolta and user likes John Travolta (latent factors/hidden relationships). However, if movie is really bad (with John Travolta) need to add in a bias.

https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson5-sgd-mnist.ipynb

%matplotlib inline
from fastai.basics import *

Get data from http://deeplearning.net/data/mnist/mnist.pkl.gz

path = Config().data_path()/'mnist'

with gzip.open(path/'mnist.pkl.gz', 'rb') as f:
    ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')

plt.imshow(x_train[0].reshape((28,28)), cmap="gray")
x_train.shape

x_train,y_train,x_valid,y_valid = map(torch.tensor, (x_train,y_train,x_valid,y_valid))
n,c = x_train.shape
x_train.shape, y_train.min(), y_train.max()

bs=64
train_ds = TensorDataset(x_train, y_train) # you can index into it
valid_ds = TensorDataset(x_valid, y_valid)
data = DataBunch.create(train_ds, valid_ds, bs=bs)

x,y = next(iter(data.train_dl))
x.shape,y.shape

Sub classing

class Mnist_Logistic(nn.Module):
    def __init__(self):
        super().__init__() # Copy nn.Module's input.
        self.lin = nn.Linear(784, 10, bias=True) # x@a + b

    # put into mini-batch
    def forward(self, xb): return self.lin(xb)

model = Mnist_Logistic().cuda()

model

model.lin

model(x).shape

# shows input and output size
[p.shape for p in model.parameters()]

lr=2e-2

# adds a softmax
loss_func = nn.CrossEntropyLoss()

def update(x,y,lr):
    # weight decay value
    wd = 1e-5
    y_hat = model(x)
    # weight decay
    w2 = 0.
    for p in model.parameters(): w2 += (p**2).sum()
    # add to regular loss
    loss = loss_func(y_hat, y) + w2*wd
    loss.backward()
    with torch.no_grad():
        # Loop through the paramters
        for p in model.parameters():
            p.sub_(lr * p.grad)
            p.grad.zero_()
    return loss.item() # values

# for one mini-batch
losses = [update(x,y,lr) for x,y in data.train_dl]

plt.plot(losses);

class Mnist_NN(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin1 = nn.Linear(784, 50, bias=True)
        self.lin2 = nn.Linear(50, 10, bias=True)

    def forward(self, xb):
        x = self.lin1(xb)
        x = F.relu(x)
        return self.lin2(x)

model = Mnist_NN().cuda()

losses = [update(x,y,lr) for x,y in data.train_dl]

plt.plot(losses);

model = Mnist_NN().cuda()

def update(x,y,lr):
    opt = optim.Adam(model.parameters(), lr)
    # opt = optim.SGD(model.parameters(), lr, momentum=0.9)
    y_hat = model(x)
    loss = loss_func(y_hat, y)
    loss.backward()
    opt.step()
    opt.zero_grad()
    return loss.item()

90% same direction as last time and 10% derivative (momentum).

s_t = alpha * g + (1 - alpha) * s_t-1.

RMSprop - gradient squared.

Adam - RMSprop and momentum

losses = [update(x,y,1e-3) for x,y in data.train_dl]

plt.plot(losses);

learn = Learner(data, Mnist_NN(), loss_func=loss_func, metrics=accuracy)
learn.lr_find()
learn.recorder.plot()

learn.fit_one_cycle(1, 1e-2)

# Learning rate per batch
learn.recorder.plot_lr(show_moms=True)

learn.recorder.plot_losses()

Cross entropy loss on two categories

Cat | Drop | Pred(Cat) | Pred(dog) | X-Entropy

1 | 0 | 0.5 | 0.5 | -1*log(0.5) -0*log(0.5).

Use softmax so they add up to 1.

See also

https://forums.fast.ai/t/deep-learning-lesson-5-notes/31298

https://github.com/hiromis/notes/blob/master/Lesson5.md

https://github.com/fastai/course-v3/blob/master/files/xl/collab_filter.xlsx

https://docs.google.com/spreadsheets/d/1oxY9bxgLPutRidhTrucFeg5Il0Jq7UdMJgR3igTtbPU/edit#gid=1748360111 - google sheets version

https://forums.fast.ai/t/google-sheets-versions-of-spreadsheets/10424/7

https://forums.fast.ai/t/lesson-5-discussion-thread/30864

https://forums.fast.ai/t/lesson-5-further-discussion/30865

https://towardsdatascience.com/netflix-and-chill-building-a-recommendation-system-in-excel-c69b33c914f4

http://ruder.io/optimizing-gradient-descent/

Lesson 6: Regularization; Convolutions; Data ethics

https://course.fast.ai/videos/?lesson=6

Make sure you have the latest version of the code and the latest version of the course

$ conda update conda
$ conda update anaconda
$ conda activate fastai
$ conda update --all

Compare https://github.com/fastai/course-v3 to my local course.

cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1
jupyter notebook

Rossmann data clean

https://github.com/fastai/course-v3/blob/master/nbs/dl1/rossman_data_clean.ipynb

%reload_ext autoreload
%autoreload 2
from fastai.basics import *

Download data from here http://files.fast.ai/part2/lesson14/rossmann.tgz

PATH=Config().data_path()/Path('rossmann/')
table_names = ['train', 'store', 'store_states', 'state_names', 'googletrend', 'weather', 'test']
tables = [pd.read_csv(PATH/f'{fname}.csv', low_memory=False) for fname in table_names]
train, store, store_states, state_names, googletrend, weather, test = tables
len(train),len(test)

turn state Holidays to booleans

train.StateHoliday = train.StateHoliday!='0'
test.StateHoliday = test.StateHoliday!='0'

def join_df(left, right, left_on, right_on=None, suffix='_y'):
    if right_on is None: right_on = left_on
    return left.merge(right, how='left', left_on=left_on, right_on=right_on, 
                      suffixes=("", suffix))

Join weather/state names

weather = join_df(weather, state_names, "file", "StateName")

Extracting dates and state names from the given data and adding those columns

googletrend['Date'] = googletrend.week.str.split(' - ', expand=True)[0]
googletrend['State'] = googletrend.file.str.split('_', expand=True)[2]
googletrend.loc[googletrend.State=='NI', "State"] = 'HB,NI'

Extracts particular date fields from a complete datetime for the purpose of constructing categoricals

def add_datepart(df, fldname, drop=True, time=False):
    "Helper function that adds columns relevant to a date."
    fld = df[fldname]
    fld_dtype = fld.dtype
    if isinstance(fld_dtype, pd.core.dtypes.dtypes.DatetimeTZDtype):
        fld_dtype = np.datetime64

    if not np.issubdtype(fld_dtype, np.datetime64):
        df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True)
    targ_pre = re.sub('[Dd]ate$', '', fldname)
    attr = ['Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear',
            'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start']
    if time: attr = attr + ['Hour', 'Minute', 'Second']
    for n in attr: df[targ_pre + n] = getattr(fld.dt, n.lower())
    df[targ_pre + 'Elapsed'] = fld.astype(np.int64) // 10 ** 9
    if drop: df.drop(fldname, axis=1, inplace=True)

add_datepart(weather, "Date", drop=False)
add_datepart(googletrend, "Date", drop=False)
add_datepart(train, "Date", drop=False)
add_datepart(test, "Date", drop=False)

Google trends data has a special category for the whole of the Germany - we'll pull that out so we can use it explicitly.

trend_de = googletrend[googletrend.file == 'Rossmann_DE']

Outer join all of our data into a single dataframe and check for nulls to make sure it works

store = join_df(store, store_states, "Store")
len(store[store.State.isnull()])
joined = join_df(train, store, "Store")
joined_test = join_df(test, store, "Store")
len(joined[joined.StoreType.isnull()]),len(joined_test[joined_test.StoreType.isnull()])
joined = join_df(joined, googletrend, ["State","Year", "Week"])
joined_test = join_df(joined_test, googletrend, ["State","Year", "Week"])
len(joined[joined.trend.isnull()]),len(joined_test[joined_test.trend.isnull()])
joined = joined.merge(trend_de, 'left', ["Year", "Week"], suffixes=('', '_DE'))
joined_test = joined_test.merge(trend_de, 'left', ["Year", "Week"], suffixes=('', '_DE'))
len(joined[joined.trend_DE.isnull()]),len(joined_test[joined_test.trend_DE.isnull()])
joined = join_df(joined, weather, ["State","Date"])
joined_test = join_df(joined_test, weather, ["State","Date"])
len(joined[joined.Mean_TemperatureC.isnull()]),len(joined_test[joined_test.Mean_TemperatureC.isnull()])

for df in (joined, joined_test):
    for c in df.columns:
        if c.endswith('_y'):
            if c in df.columns: df.drop(c, inplace=True, axis=1)

fill in missing values to avoid complications with NA's. Use random values

for df in (joined,joined_test):
    df['CompetitionOpenSinceYear'] = df.CompetitionOpenSinceYear.fillna(1900).astype(np.int32)
    df['CompetitionOpenSinceMonth'] = df.CompetitionOpenSinceMonth.fillna(1).astype(np.int32)
    df['Promo2SinceYear'] = df.Promo2SinceYear.fillna(1900).astype(np.int32)
    df['Promo2SinceWeek'] = df.Promo2SinceWeek.fillna(1).astype(np.int32)

extract features "CompetitionOpenSince" and "CompetitionDaysOpen"

for df in (joined,joined_test):
    df["CompetitionOpenSince"] = pd.to_datetime(dict(year=df.CompetitionOpenSinceYear, 
                                                     month=df.CompetitionOpenSinceMonth, day=15))
    df["CompetitionDaysOpen"] = df.Date.subtract(df.CompetitionOpenSince).dt.days

replace some erroneous / outlying data

for df in (joined,joined_test):
    df.loc[df.CompetitionDaysOpen<0, "CompetitionDaysOpen"] = 0
    df.loc[df.CompetitionOpenSinceYear<1990, "CompetitionDaysOpen"] = 0

add "CompetitionMonthsOpen" field, limiting the maximum to 2 years to limit number of unique categories

for df in (joined,joined_test):
    df["CompetitionMonthsOpen"] = df["CompetitionDaysOpen"]//30
    df.loc[df.CompetitionMonthsOpen>24, "CompetitionMonthsOpen"] = 24
joined.CompetitionMonthsOpen.unique()

! pip install isoweek

from isoweek import Week
for df in (joined,joined_test):
    df["Promo2Since"] = pd.to_datetime(df.apply(lambda x: Week(
        x.Promo2SinceYear, x.Promo2SinceWeek).monday(), axis=1))
    df["Promo2Days"] = df.Date.subtract(df["Promo2Since"]).dt.days

for df in (joined,joined_test):
    df.loc[df.Promo2Days<0, "Promo2Days"] = 0
    df.loc[df.Promo2SinceYear<1990, "Promo2Days"] = 0
    df["Promo2Weeks"] = df["Promo2Days"]//7
    df.loc[df.Promo2Weeks<0, "Promo2Weeks"] = 0
    df.loc[df.Promo2Weeks>25, "Promo2Weeks"] = 25
    df.Promo2Weeks.unique()

joined.to_pickle(PATH/'joined')
joined_test.to_pickle(PATH/'joined_test')

It is common when working with time series data to extract data that explains relationships across rows as opposed to columns, e.g.:

  • Running averages
  • Time until next event
  • Time since last event

Define a function get_elapsed for cumulative counting across a sorted dataframe.

def get_elapsed(fld, pre):
    day1 = np.timedelta64(1, 'D')
    last_date = np.datetime64()
    last_store = 0
    res = []

    for s,v,d in zip(df.Store.values,df[fld].values, df.Date.values):
        if s != last_store:
            last_date = np.datetime64()
            last_store = s
        if v: last_date = d
        res.append(((d-last_date).astype('timedelta64[D]') / day1))
    df[pre+fld] = res

# Apply it to
columns = ["Date", "Store", "Promo", "StateHoliday", "SchoolHoliday"]

df = train[columns].append(test[columns])

Say we're looking at School Holiday. We'll first sort by Store, then Date, and then call add_elapsed('SchoolHoliday', 'After'): This will apply to each row with School Holiday:

  • A applied to every row of the dataframe in order of store and date
  • Will add to the dataframe the days since seeing a School Holiday
  • If we sort in the other direction, this will count the days until another holiday.
fld = 'SchoolHoliday'
df = df.sort_values(['Store', 'Date'])
get_elapsed(fld, 'After')
df = df.sort_values(['Store', 'Date'], ascending=[True, False])
get_elapsed(fld, 'Before')

fld = 'StateHoliday'
df = df.sort_values(['Store', 'Date'])
get_elapsed(fld, 'After')
df = df.sort_values(['Store', 'Date'], ascending=[True, False])
get_elapsed(fld, 'Before')

fld = 'Promo'
df = df.sort_values(['Store', 'Date'])
get_elapsed(fld, 'After')
df = df.sort_values(['Store', 'Date'], ascending=[True, False])
get_elapsed(fld, 'Before')

Set the active index to Date

df = df.set_index("Date")

Set null values from elapsed field calculations to 0.

columns = ['SchoolHoliday', 'StateHoliday', 'Promo']

for o in ['Before', 'After']:
    for p in columns:
        a = o+p
        df[a] = df[a].fillna(0).astype(int)

Demonstrate window functions in pandas to calculate rolling quantities

sort by date (sort_index()) and count the number of events of interest (sum()) defined in columns in the following week (rolling()), grouped by Store (groupby()). Do the same in the opposite direction.

bwd = df[['Store']+columns].sort_index().groupby("Store").rolling(7, min_periods=1).sum()

fwd = df[['Store']+columns].sort_index(ascending=False).groupby("Store").rolling(7, min_periods=1).sum()

Drop the Store indices grouped together in the window function

bwd.drop('Store',1,inplace=True)
bwd.reset_index(inplace=True)
fwd.drop('Store',1,inplace=True)
fwd.reset_index(inplace=True)
df.reset_index(inplace=True)

Merge these values onto the df

df = df.merge(bwd, 'left', ['Date', 'Store'], suffixes=['', '_bw'])
df = df.merge(fwd, 'left', ['Date', 'Store'], suffixes=['', '_fw'])
df.drop(columns,1,inplace=True)

Back up large tables of extracted / wrangled features before you join them onto another one

df.to_pickle(PATH/'df')

df["Date"] = pd.to_datetime(df.Date)

joined = pd.read_pickle(PATH/'joined')
joined_test = pd.read_pickle(PATH/f'joined_test')

joined = join_df(joined, df, ['Store', 'Date'])

joined_test = join_df(joined_test, df, ['Store', 'Date'])

removed all instances where the store had zero sale / was closed

joined = joined[joined.Sales!=0]

joined.reset_index(inplace=True)
joined_test.reset_index(inplace=True)

joined.to_pickle(PATH/'train_clean')
joined_test.to_pickle(PATH/'test_clean')

Rossmann

https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson6-rossmann.ipynb

%reload_ext autoreload
%autoreload 2
from fastai.tabular import *

The most useful part of the data clean is:

add_datepart(train, "Date", drop=False)

Take the time piece and add a bunch of meta data e.g. year, month, day of week, month start or end, elapsed time since ... (Auto ML!). e.g. purchasing behavior may change on day of week.

path = Config().data_path()/'rossmann'

train_df = pd.read_pickle(path/'train_clean')

train_df.head().T

n = len(train_df); n

Pre-processors - run once before you do any training (on training set). Shared with the validation dataset.

Create a small subset of the data:

idx = np.random.permutation(range(n))[:2000]
idx.sort()
small_train_df = train_df.iloc[idx[:1000]]
small_test_df = train_df.iloc[idx[1000:]]
small_cont_vars = ['CompetitionDistance', 'Mean_Humidity']
small_cat_vars =  ['Store', 'DayOfWeek', 'PromoInterval']
small_train_df = small_train_df[small_cat_vars + small_cont_vars + ['Sales']]
small_test_df = small_test_df[small_cat_vars + small_cont_vars + ['Sales']]
small_train_df.head()

First pre-processor is take the strings in PromoInterval and find all unique values, create a list and convert them into numbers.

categorify = Categorify(small_cat_vars, small_cont_vars)
categorify(small_train_df)
categorify(small_test_df, test=True)

small_test_df.head()

see categories

small_train_df.PromoInterval.cat.categories

see codes

small_train_df['PromoInterval'].cat.codes[:5]

Another pre-processor is to fill missing values. Add's a columns called _na (boolean) and adds a medium value.

fill_missing = FillMissing(small_cat_vars, small_cont_vars)
fill_missing(small_train_df)
fill_missing(small_test_df, test=True)

Read in full dataset

train_df = pd.read_pickle(path/'train_clean')
test_df = pd.read_pickle(path/'test_clean')

Specify pre-processors

procs=[FillMissing, Categorify, Normalize]

cat_vars = ['Store', 'DayOfWeek', 'Year', 'Month', 'Day', 'StateHoliday', 'CompetitionMonthsOpen',
    'Promo2Weeks', 'StoreType', 'Assortment', 'PromoInterval', 'CompetitionOpenSinceYear', 'Promo2SinceYear',
    'State', 'Week', 'Events', 'Promo_fw', 'Promo_bw', 'StateHoliday_fw', 'StateHoliday_bw',
    'SchoolHoliday_fw', 'SchoolHoliday_bw']

cont_vars = ['CompetitionDistance', 'Max_TemperatureC', 'Mean_TemperatureC', 'Min_TemperatureC',
   'Max_Humidity', 'Mean_Humidity', 'Min_Humidity', 'Max_Wind_SpeedKm_h', 
   'Mean_Wind_SpeedKm_h', 'CloudCover', 'trend', 'trend_DE',
   'AfterStateHoliday', 'BeforeStateHoliday', 'Promo', 'SchoolHoliday']

dep_var = 'Sales'
df = train_df[cat_vars + cont_vars + [dep_var,'Date']].copy()

test_df['Date'].min(), test_df['Date'].max()

Use date to create a validation dataset. Same length at test set

cut = train_df['Date'][(train_df['Date'] == train_df['Date'][len(test_df)])].index.max()

valid_idx = range(cut)

df[dep_var].head()

the dep variance is an int. Fastai will think it's a classification problem. Need to specific it's a regression by doing label_cls is a list of floats with log=True. Take the log of the dependent variable. Because eval metric is RMSPE take the log of y which makes it RMSE.

data = (TabularList.from_df(df, path=path, cat_names=cat_vars, cont_names=cont_vars, procs=procs,)
                .split_by_idx(valid_idx)
                .label_from_df(cols=dep_var, label_cls=FloatList, log=True)
                .add_test(TabularList.from_df(test_df, path=path, cat_names=cat_vars, cont_names=cont_vars))
                .databunch())

doc(FloatList)

Model

Pass in y_range which gives a sigmoid between 0 and an upper limit of the dependent variables.

max_log_y = np.log(np.max(train_df['Sales'])*1.2)
y_range = torch.tensor([0, max_log_y], device=defaults.device)

Pass in architecture. NN is 1000 * 500 paramters. This will overfit a data with a few hundred thousand rows. p's (probabilities) provide dropout

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Emb_drop also provide dropout. Embedding is matmul of OHE.

learn = tabular_learner(data, layers=[1000,500], ps=[0.001,0.01], emb_drop=0.04, 
                        y_range=y_range, metrics=exp_rmspe)

learn.model

First emb layer is number of stores (first cat variable). Second number is size of the embedding. Then batch norm of size 16 (16 input variables).

batch normalization: accelerating deep network training by reducing internal covariate shift. Loss function is less bumper so you can increase your LR.

y^ = f(w1,...wn, x)*g + b.

g + b are parameters for batch norm that help scale the output to expected range (mean and std).

len(data.train_ds.cont_names)
learn.lr_find()
learn.recorder.plot()

learn.fit_one_cycle(5, 1e-3, wd=0.2)

learn.save('1')

learn.recorder.plot_losses(last=-1)

learn.load('1');

learn.fit_one_cycle(5, 3e-4)

learn.fit_one_cycle(5, 3e-4)

test_preds=learn.get_preds(DatasetType.Test)
test_df["Sales"]=np.exp(test_preds[0].data).numpy().T[0]
test_df[["Id","Sales"]]=test_df[["Id","Sales"]].astype("int")
test_df[["Id","Sales"]].to_csv("rossmann_submission.csv",index=False)

Pets

https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson6-pets-more.ipynb

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai.vision import *
bs = 64
path = untar_data(URLs.PETS)/'images'

Data augmentation

Ratchet up the defaults. What's the probability of an affine transform? What's the probability of a light transform?

https://docs.fast.ai/vision.transform.html#get_transforms

Look at validation dataset and see what the lighting looks like.

e.g. satellite data use rotated images.

Use flipped images.

Symmetric warp

tfms = get_transforms(max_rotate=20, max_zoom=1.3, max_lighting=0.4, max_warp=0.4,
                      p_affine=1., p_lighting=1.)

src = ImageList.from_folder(path).split_by_rand_pct(0.2, seed=2)

def get_data(size, bs, padding_mode='reflection'):
    return (src.label_from_re(r'([^/]+)_\d+.jpg$')
           .transform(tfms, size=size, padding_mode=padding_mode)
           .databunch(bs=bs).normalize(imagenet_stats))

data = get_data(224, bs, 'zeros')

def _plot(i,j,ax):
    x,y = data.train_ds[3]
    x.show(ax, y=y)

plot_multi(_plot, 3, 3, figsize=(8,8))

data = get_data(224,bs)
plot_multi(_plot, 3, 3, figsize=(8,8))

Train a model

gc.collect()
learn = cnn_learner(data, models.resnet34, metrics=error_rate, bn_final=True)

learn.fit_one_cycle(3, slice(1e-2), pct_start=0.8)

learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-3), pct_start=0.8)

data = get_data(352,bs)
learn.data = data

learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4))

learn.save('352')

Convolutional kernel

data = get_data(352,16)

learn = cnn_learner(data, models.resnet34, metrics=error_rate, bn_final=True).load('352')

idx=0
x,y = data.valid_ds[idx]
x.show()
data.valid_ds.y[idx]

k = tensor([
    [0.  , -5/3, 1],
    [-5/3, -5/3, 1],
    [1., 1   ,1],
]).expand(1, 3, 3, 3) / 6
k
k.shape

t = data.valid_ds[0][0].data; t.shape

t[None].shape

edge = F.conv2d(t[None], k)

show_image(edge[0], figsize=(5,5));

data.c

learn.model

https://www.fast.ai/2018/07/02/adam-weight-decay/

print(learn.summary())

heatmap

m = learn.model.eval();

m[0] is convolutional part

Create a mini-bath with 1 thing in it

xb,_ = data.one_item(x)
xb_im = Image(data.denorm(xb)[0])
xb = xb.cuda()

from fastai.callbacks.hooks import *

a hook allows you to hook into the fastai/python library and run python e.g. return the convolutional part or a certain layer. Hook the output of m[0]

def hooked_backward(cat=y):
    with hook_output(m[0]) as hook_a: 
        with hook_output(m[0], grad=True) as hook_g:
            preds = m(xb)
            preds[0,int(cat)].backward()
    return hook_a,hook_g

hook_a,hook_g = hooked_backward()

acts  = hook_a.stored[0].cpu()
acts.shape

Take mean of channel axis

avg_acts = acts.mean(0)
avg_acts.shape

def show_heatmap(hm):
    _,ax = plt.subplots()
    xb_im.show(ax)
    ax.imshow(hm, alpha=0.6, extent=(0,352,352,0),
              interpolation='bilinear', cmap='magma');

show_heatmap(avg_acts)

Grad-CAM

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

grad = hook_g.stored[0][0].cpu()
grad_chan = grad.mean(1).mean(1)
grad.shape,grad_chan.shape

mult = (acts*grad_chan[...,None,None]).mean(0)

show_heatmap(mult)

Ethics and Data Science

Generative models - create new text, new image, new video, new sound.

Artificial Intelligence needs all of us | Rachel Thomas P.h.D. | TEDxSanFrancisco

Some Healthy Principles About Ethics & Bias In AI | Rachel Thomas @ PyBay2018

accuracy on lighter male vs darker skim female - http://gendershades.org/

https://www.crunchbase.com/organization/deep-glint#section-overview - Facial AI for surveillance

Text translation e.g. English -> Turkey -> English 'He is a doctor. She is a nurse'.

Compass - for law to suggest jail vs. bail.

Why?

Get humans back in the loop.

Talk to domain experts and those impacted - https://fatconference.org/

Evan Estola - When Recommendations Systems Go Bad - MLconf SEA 2016

Datasheets for Datasets - better documentation regarding datasets.

See also

https://github.com/hiromis/notes/blob/master/Lesson6.md

https://forums.fast.ai/t/lesson-6-in-class-discussion/31440

https://forums.fast.ai/t/lesson-6-advanced-discussion/31442

https://platform.ai/ - comp vision start-up. - Upload pics and use it to help labels your pics based on a deep learning model (e.g. choose a layer or a choose a projection).

https://forums.fast.ai/t/platform-ai-discussion/31445

50 Years of Test (Un)fairness: Lessons for Machine Learning paper - https://128.84.21.199/pdf/1811.10104.pdf

Cornell conv course - http://www.cs.cornell.edu/courses/cs1114/2013sp/sections/S06_convolution.pdf

conv arithmetic - https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md

https://arthurdouillard.com/post/normalization/ e.g. images

cross entropy loss - https://gombru.github.io/2018/05/23/cross_entropy_loss/

https://brohrer.github.io/how_convolutional_neural_networks_work.html

https://openframeworks.cc/ofBook/chapters/image_processing_computer_vision.html

https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b

https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270

https://knowingneurons.com/2014/10/29/hubel-and-wiesel-the-neural-basis-of-visual-perception/

https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Perception

https://www.ted.com/talks/jeremy_howard_the_wonderful_and_terrifying_implications_of_computers_that_can_learn/up-next?language=en

http://setosa.io/ev/image-kernels/

Lesson 7: Resnets from scratch; U-net; Generative (adversarial) networks

https://course.fast.ai/videos/?lesson=7

Make sure you have the latest version of the code and the latest version of the course

$ conda update conda
$ conda update anaconda
$ conda activate fastai
$ conda update --all

Compare https://github.com/fastai/course-v3 to my local course.

cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1
jupyter notebook

Resnet MNIST

https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-resnet-mnist.ipynb

%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *

path = untar_data(URLs.MNIST)
path.ls()
il = ImageList.from_folder(path, convert_mode='L') # Convert to grey scale
il.items[0]
defaults.cmap='binary'

il
il[0].show()

# Has labels therefore is valid not test
sd = il.split_by_folder(train='training', valid='testing')
sd

(path/'training').ls()

ll = sd.label_from_folder() # label list

ll

x,y = ll.train[0]
x.show()
print(y,x.shape)

# Transforms
tfms = ([*rand_pad(padding=3, size=28, mode='zeros')], [])
ll = ll.transform(tfms)
bs = 128

# not using imagenet_stats because not using pretrained model
data = ll.databunch(bs=bs).normalize()

x,y = data.train_ds[0]
x.show()
print(y)

def _plot(i,j,ax): data.train_ds[0][0].show(ax, cmap='gray')
plot_multi(_plot, 3, 3, figsize=(8,8))

xb,yb = data.one_batch()
xb.shape,yb.shape

data.show_batch(rows=3, figsize=(5,5))

Basic CNN with batchnorm

def conv(ni,nf): return nn.Conv2d(ni, nf, kernel_size=3, stride=2, padding=1)
model = nn.Sequential(
    conv(1, 8), # 14
    nn.BatchNorm2d(8),
    nn.ReLU(),
    conv(8, 16), # 7
    nn.BatchNorm2d(16),
    nn.ReLU(),
    conv(16, 32), # 4
    nn.BatchNorm2d(32),
    nn.ReLU(),
    conv(32, 16), # 2
    nn.BatchNorm2d(16),
    nn.ReLU(),
    conv(16, 10), # 1
    nn.BatchNorm2d(10),
    Flatten()     # remove (1,1) grid
)

learn = Learner(data, model, loss_func = nn.CrossEntropyLoss(), metrics=accuracy)

print(learn.summary())

model(xb).shape

learn.lr_find(end_lr=100)

learn.recorder.plot()

learn.fit_one_cycle(3, max_lr=0.1)

Refactor

def conv2(ni,nf): return conv_layer(ni,nf,stride=2)
model = nn.Sequential(
    conv2(1, 8),   # 14
    conv2(8, 16),  # 7
    conv2(16, 32), # 4
    conv2(32, 16), # 2
    conv2(16, 10), # 1
    Flatten()      # remove (1,1) grid
)
learn = Learner(data, model, loss_func = nn.CrossEntropyLoss(), metrics=accuracy)
learn.fit_one_cycle(10, max_lr=0.1)

Resnet-ish

x -> Two layers (f(x)) -> f(x) + x. Identity/skipped connection.

class ResBlock(nn.Module):
    def __init__(self, nf):
        super().__init__()
        self.conv1 = conv_layer(nf,nf)
        self.conv2 = conv_layer(nf,nf)
        
    def forward(self, x): return x + self.conv2(self.conv1(x))

help(res_block)

model = nn.Sequential(
    conv2(1, 8),
    res_block(8),
    conv2(8, 16),
    res_block(16),
    conv2(16, 32),
    res_block(32),
    conv2(32, 16),
    res_block(16),
    conv2(16, 10),
    Flatten()
)

def conv_and_res(ni,nf): return nn.Sequential(conv2(ni, nf), res_block(nf))

model = nn.Sequential(
    conv_and_res(1, 8),
    conv_and_res(8, 16),
    conv_and_res(16, 32),
    conv_and_res(32, 16),
    conv2(16, 10),
    Flatten()
)

learn = Learner(data, model, loss_func = nn.CrossEntropyLoss(), metrics=accuracy)

learn.lr_find(end_lr=100)
learn.recorder.plot()

learn.fit_one_cycle(12, max_lr=0.05)

print(learn.summary())

A guide to convolution arithmetic for deep learning

Could scale image up and use NN interp.

U-Net: Convolutional Networks for Biomedical Image Segmentation

Have to end up with something same size as image. Add padding outside input and in between things. Use skipped connections with the down part of u-net.

Image restoration.

Pretrained GAN

https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-superres-gan.ipynb

import fastai
from fastai.vision import *
from fastai.callbacks import *
from fastai.vision.gan import *

path = untar_data(URLs.PETS)
path_hr = path/'images'
path_lr = path/'crappy'

Crappify

Resize to be small, pick a random number, draw it on image. e.g. if you want to color black and white image make it black and white.

from fastai.vision import *
from PIL import Image, ImageDraw, ImageFont

class crappifier(object):
    def __init__(self, path_lr, path_hr):
        self.path_lr = path_lr
        self.path_hr = path_hr              
        
    def __call__(self, fn, i):       
        dest = self.path_lr/fn.relative_to(self.path_hr)    
        dest.parent.mkdir(parents=True, exist_ok=True)
        img = PIL.Image.open(fn)
        targ_sz = resize_to(img, 96, use_min=True)
        img = img.resize(targ_sz, resample=PIL.Image.BILINEAR).convert('RGB')
        w,h = img.size
        q = random.randint(10,70)
        ImageDraw.Draw(img).text((random.randint(0,w//2),random.randint(0,h//2)), str(q), fill=(255,255,255))
        img.save(dest, quality=q)

from crappify import *

il = ImageList.from_folder(path_hr)
parallel(crappifier(path_lr, path_hr), il.items)

bs,size=32, 128
# bs,size = 24,160
#bs,size = 8,256

Pre-train generator

arch = models.resnet34
src = ImageImageList.from_folder(path_lr).split_by_rand_pct(0.1, seed=42)

def get_data(bs,size):
    data = (src.label_from_func(lambda x: path_hr/x.name)
           .transform(get_transforms(max_zoom=2.), size=size, tfm_y=True)
           .databunch(bs=bs).normalize(imagenet_stats, do_y=True))

    data.c = 3
    return data

data_gen = get_data(bs,size)

data_gen.show_batch(4)

Make a U-net. Use a model with pre-trained wegiths

wd = 1e-3
y_range = (-3.,3.)
loss_gen = MSELossFlat() # flattens out images
def create_gen_learner():
    return unet_learner(data_gen, arch, wd=wd, blur=True, norm_type=NormType.Weight,
                         self_attention=True, y_range=y_range, loss_func=loss_gen)

learn_gen = create_gen_learner()
learn_gen.fit_one_cycle(2, pct_start=0.8)

learn_gen.unfreeze() # Un-freeze model (res-net) down sample part
learn_gen.fit_one_cycle(3, slice(1e-6,1e-3))

learn_gen.show_results(rows=4)

learn_gen.save('gen-pre2')

Model works but leaves some artifacts. GAN -> Loss is discriminator/critic. Fine tune the generator.

Create the critic. Save the generated images.

learn_gen.load('gen-pre2');
name_gen = 'image_gen'
path_gen = path/name_gen
path_gen.mkdir(exist_ok=True)

def save_preds(dl):
    i=0
    names = dl.dataset.items
    
    for b in dl:
        preds = learn_gen.pred_batch(batch=b, reconstruct=True)
        for o in preds:
            o.save(path_gen/names[i].name)
            i += 1
save_preds(data_gen.fix_dl)

PIL.Image.open(path_gen.ls()[0])

Train critic

learn_gen=None # Clear up GPU
gc.collect()

Pretrain the critic on crappy vs not crappy.

def get_crit_data(classes, bs, size):
    src = ImageList.from_folder(path, include=classes).split_by_rand_pct(0.1, seed=42)
    ll = src.label_from_folder(classes=classes)
    data = (ll.transform(get_transforms(max_zoom=2.), size=size)
           .databunch(bs=bs).normalize(imagenet_stats))
    data.c = 3
    return data

data_crit = get_crit_data([name_gen, 'images'], bs=bs, size=size)

data_crit.show_batch(rows=3, ds_type=DatasetType.Train, imgsize=3)

loss_critic = AdaptiveLoss(nn.BCEWithLogitsLoss()) # Binary cross-entropy

def create_critic_learner(data, metrics):
    return Learner(data, gan_critic(), metrics=metrics, loss_func=loss_critic, wd=wd)

learn_critic = create_critic_learner(data_crit, accuracy_thresh_expand)

learn_critic.fit_one_cycle(6, 1e-3)

learn_critic.save('critic-pre2')

GAN

combine those pretrained model in a GAN

learn_crit=None
learn_gen=None
gc.collect()
data_crit = get_crit_data(['crappy', 'images'], bs=bs, size=size)
learn_crit = create_critic_learner(data_crit, metrics=None).load('critic-pre2')
learn_gen = create_gen_learner().load('gen-pre2')

switcher = partial(AdaptiveGANSwitcher, critic_thresh=0.65)
learn = GANLearner.from_learners(learn_gen, learn_crit, weights_gen=(1.,50.), show_img=False, switcher=switcher,
                                 opt_func=partial(optim.Adam, betas=(0.,0.99)), wd=wd)
learn.callback_fns.append(partial(GANDiscriminativeLR, mult_lr=5.))

lr = 1e-4

learn.fit(40,lr)

learn.save('gan-1c')

learn.data=get_data(16,192)

learn.fit(10,lr/2)

learn.show_results(rows=16)

learn.save('gan-1c')

https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-wgan.ipynb

%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
from fastai.vision.gan import *

LSun bedroom data https://github.com/fyu/lsun

path = untar_data(URLs.LSUN_BEDROOMS)

Random noise of size 100 by default as inputs and the images of bedrooms as targets. tfm_y=True in the transforms, then apply the normalization to the ys

def get_data(bs, size):
    return (GANItemList.from_folder(path, noise_sz=100)
               .split_none()
               .label_from_func(noop)
               .transform(tfms=[[crop_pad(size=size, row_pct=(0,1), col_pct=(0,1))], []], size=size, tfm_y=True)
               .databunch(bs=bs)
               .normalize(stats = [torch.tensor([0.5,0.5,0.5]), torch.tensor([0.5,0.5,0.5])], do_x=False, do_y=True))

begin with a small side and use gradual resizing

data = get_data(128, 64)
data.show_batch(rows=5)

Generative Adversarial Nets - https://arxiv.org/pdf/1406.2661.pdf

Train two models at the same time: a generator and a critic. The generator will try to make new images similar to the ones in our dataset, and the critic will try to classify real images from the ones the generator does. The generator returns images, the critic a single number (usually 0. for fake images and 1. for real ones).

We train them against each other in the sense that at each step (more or less), we:

  1. Freeze the generator and train the critic for one step by:
    • getting one batch of true images (let's call that real)
    • generating one batch of fake images (let's call that fake)
    • have the critic evaluate each batch and compute a loss function from that; the important part is that it rewards positively the detection of real images and penalizes the fake ones
    • update the weights of the critic with the gradients of this loss
  2. Freeze the critic and train the generator for one step by:
    • generating one batch of fake images
    • evaluate the critic on it
    • return a loss that rewards posisitivly the critic thinking those are real images; the important part is that it rewards positively the detection of real images and penalizes the fake ones
    • update the weights of the generator with the gradients of this loss

Wasserstein GAN - https://arxiv.org/pdf/1701.07875.pdf

Create a generator and a critic that we pass to gan_learner. The noise_size is the size of the random vector from which our generator creates images.

generator = basic_generator(in_size=64, n_channels=3, n_extra_layers=1)
critic    = basic_critic   (in_size=64, n_channels=3, n_extra_layers=1)
learn = GANLearner.wgan(data, generator, critic, switch_eval=False,
                        opt_func = partial(optim.Adam, betas = (0.,0.99)), wd=0.)
learn.fit(30,2e-4)

learn.gan_trainer.switch(gen_mode=True)
learn.show_results(ds_type=DatasetType.Train, rows=16, figsize=(8,8))

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Downsample encoder and upsample decoder.

https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-superres.ipynb

Super resolution

import fastai
from fastai.vision import *
from fastai.callbacks import *
from fastai.utils.mem import *

from torchvision.models import vgg16_bn
path = untar_data(URLs.PETS)
path_hr = path/'images'
path_lr = path/'small-96'
path_mr = path/'small-256'

il = ImageList.from_folder(path_hr)

Crappify

def resize_one(fn, i, path, size):
    dest = path/fn.relative_to(path_hr)
    dest.parent.mkdir(parents=True, exist_ok=True)
    img = PIL.Image.open(fn)
    targ_sz = resize_to(img, size, use_min=True)
    img = img.resize(targ_sz, resample=PIL.Image.BILINEAR).convert('RGB')
    img.save(dest, quality=60)

# create smaller image sets the first time this nb is run
sets = [(path_lr, 96), (path_mr, 256)]
for p,size in sets:
    if not p.exists(): 
        print(f"resizing to {size} into {p}")
        parallel(partial(resize_one, path=p, size=size), il.items)

bs,size=32,128
arch = models.resnet34

src = ImageImageList.from_folder(path_lr).split_by_rand_pct(0.1, seed=42)

def get_data(bs,size):
    data = (src.label_from_func(lambda x: path_hr/x.name)
           .transform(get_transforms(max_zoom=2.), size=size, tfm_y=True)
           .databunch(bs=bs).normalize(imagenet_stats, do_y=True))

    data.c = 3
    return data

data = get_data(bs,size)

data.show_batch(ds_type=DatasetType.Valid, rows=2, figsize=(9,9))

Feature loss

t = data.valid_ds[0][1].data
t = torch.stack([t,t])

def gram_matrix(x):
    n,c,h,w = x.size()
    x = x.view(n, c, -1)
    return (x @ x.transpose(1,2))/(c*h*w)

gram_matrix(t)

MAE loss

base_loss = F.l1_loss

Features has the convolutional part. Eval mode as not training. Turn off requires_grad as not updating weights.

vgg_m = vgg16_bn(True).features.cuda().eval()
requires_grad(vgg_m, False)

Find just before the max pool layers (relu).

blocks = [i-1 for i,o in enumerate(children(vgg_m)) if isinstance(o,nn.MaxPool2d)]
blocks, [vgg_m[i] for i in blocks]

class FeatureLoss(nn.Module):
    def __init__(self, m_feat, layer_ids, layer_wgts):
        super().__init__()
        self.m_feat = m_feat
        self.loss_features = [self.m_feat[i] for i in layer_ids]
        self.hooks = hook_outputs(self.loss_features, detach=False)
        self.wgts = layer_wgts
        self.metric_names = ['pixel',] + [f'feat_{i}' for i in range(len(layer_ids))
              ] + [f'gram_{i}' for i in range(len(layer_ids))]

    def make_features(self, x, clone=False):
        self.m_feat(x)
        return [(o.clone() if clone else o) for o in self.hooks.stored]
    
    def forward(self, input, target):
        out_feat = self.make_features(target, clone=True)
        in_feat = self.make_features(input)
        self.feat_losses = [base_loss(input,target)]
        self.feat_losses += [base_loss(f_in, f_out)*w
                             for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
        self.feat_losses += [base_loss(gram_matrix(f_in), gram_matrix(f_out))*w**2 * 5e3
                             for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
        self.metrics = dict(zip(self.metric_names, self.feat_losses))
        return sum(self.feat_losses)
    
    def __del__(self): self.hooks.remove()

feat_loss = FeatureLoss(vgg_m, blocks[2:5], [5,15,2])

Train

wd = 1e-3
learn = unet_learner(data, arch, wd=wd, loss_func=feat_loss, callback_fns=LossMetrics,
                     blur=True, norm_type=NormType.Weight)
gc.collect();

learn.lr_find()
learn.recorder.plot()

lr = 1e-3

def do_fit(save_name, lrs=slice(lr), pct_start=0.9):
    learn.fit_one_cycle(10, lrs, pct_start=pct_start)
    learn.save(save_name)
    learn.show_results(rows=1, imgsize=5)

do_fit('1a', slice(lr*10)) # Quicker than a GAN

learn.unfreeze()

do_fit('1b', slice(1e-5,lr))

data = get_data(12,size*2)

learn.data = data
learn.freeze()
gc.collect()

learn.load('1b');

do_fit('2a')

learn.unfreeze()

do_fit('2b', slice(1e-6,1e-4), pct_start=0.3)

Test

learn = None
gc.collect();

256/320*1024

256/320*1600

free = gpu_mem_get_free_no_cache()
# the max size of the test image depends on the available GPU RAM 
if free > 8000: size=(1280, 1600) # >  8GB RAM
else:           size=( 820, 1024) # <= 8GB RAM
print(f"using size={size}, have {free}MB of GPU RAM free")

learn = unet_learner(data, arch, loss_func=F.l1_loss, blur=True, norm_type=NormType.Weight)

data_mr = (ImageImageList.from_folder(path_mr).split_by_rand_pct(0.1, seed=42)
          .label_from_func(lambda x: path_hr/x.name)
          .transform(get_transforms(), size=size, tfm_y=True)
          .databunch(bs=1).normalize(imagenet_stats, do_y=True))
data_mr.c = 3

learn.load('2b');

learn.data = data_mr

fn = data_mr.valid_ds.x.items[0]; fn

img = open_image(fn); img.shape

p,img_hr,b = learn.predict(img)

show_image(img, figsize=(18,15), interpolation='nearest');

Image(img_hr).show(figsize=(18,15))

Human numbers

https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-human-numbers.ipynb

from fastai.text import *

bs=64

path = untar_data(URLs.HUMAN_NUMBERS)
path.ls()

def readnums(d): return [', '.join(o.strip() for o in open(path/d).readlines())]

train_txt = readnums('train.txt'); train_txt[0][:80]
valid_txt = readnums('valid.txt'); valid_txt[0][-80:]

train = TextList(train_txt, path=path)
valid = TextList(valid_txt, path=path)

src = ItemLists(path=path, train=train, valid=valid).label_for_lm()
data = src.databunch(bs=bs)

train[0].text[:80] # one document so 0
# retuns xxbos. xx is unknown token. bos is beginning of string.

len(data.valid_ds[0][0].data)

data.bptt, len(data.valid_dl) # bptt is back prop through times
https://github.com/fastai/fastai/blob/f93a5f028e2cf73448dda188682d437c610424c3/fastai/text/learner.py#L248

64 batches split into 70. 3 batches

13017/70/bs
it = iter(data.valid_dl)
x1,y1 = next(it)
x2,y2 = next(it)
x3,y3 = next(it)
it.close()

x1.numel()+x2.numel()+x3.numel()

x1.shape,y1.shape
x2.shape,y2.shape

x1[:,0]
y1[:,0]

Grab a vocab. Every mini-batch joins up with the next mini-batch.

v = data.valid_ds.vocab
v.textify(x1[0])
v.textify(y1[0])
v.textify(x2[0])
v.textify(x3[0])
v.textify(x1[1])
v.textify(x2[1])
v.textify(x3[1])
v.textify(x3[-1])

data.show_batch(ds_type=DatasetType.Valid)

Single fully connected model

data = src.databunch(bs=bs, bptt=3)
x,y = data.one_batch()
x.shape,y.shape
nv = len(v.itos); nv
nh=64

def loss4(input,target): return F.cross_entropy(input, target[:,-1])
def acc4 (input,target): return accuracy(input, target[:,-1])

class Model0(nn.Module):
    def __init__(self):
        super().__init__()
        self.i_h = nn.Embedding(nv,nh)  # green arrow
        self.h_h = nn.Linear(nh,nh)     # brown arrow
        self.h_o = nn.Linear(nh,nv)     # blue arrow
        self.bn = nn.BatchNorm1d(nh)
        
    def forward(self, x):
        h = self.bn(F.relu(self.h_h(self.i_h(x[:,0]))))
        if x.shape[1]>1:
            h = h + self.i_h(x[:,1])
            h = self.bn(F.relu(self.h_h(h)))
        if x.shape[1]>2:
            h = h + self.i_h(x[:,2])
            h = self.bn(F.relu(self.h_h(h)))
        return self.h_o(h)

learn = Learner(data, Model0(), loss_func=loss4, metrics=acc4)
learn.fit_one_cycle(6, 1e-4)

Same thing with a loop

class Model1(nn.Module):
    def __init__(self):
        super().__init__()
        self.i_h = nn.Embedding(nv,nh)  # green arrow
        self.h_h = nn.Linear(nh,nh)     # brown arrow
        self.h_o = nn.Linear(nh,nv)     # blue arrow
        self.bn = nn.BatchNorm1d(nh)
        
    def forward(self, x):
        h = torch.zeros(x.shape[0], nh).to(device=x.device)
        for i in range(x.shape[1]):
            h = h + self.i_h(x[:,i])
            h = self.bn(F.relu(self.h_h(h)))
        return self.h_o(h)

learn = Learner(data, Model1(), loss_func=loss4, metrics=acc4)
learn.fit_one_cycle(6, 1e-4)

Multi-fully connected model

Use bptt as 20 (use 20 words to predict 21st?). Predict every word. e.g. array.

data = src.databunch(bs=bs, bptt=20)
x,y = data.one_batch()
x.shape,y.shape

class Model2(nn.Module):
    def __init__(self):
        super().__init__()
        self.i_h = nn.Embedding(nv,nh)
        self.h_h = nn.Linear(nh,nh)
        self.h_o = nn.Linear(nh,nv)
        self.bn = nn.BatchNorm1d(nh)
        
    def forward(self, x):
        h = torch.zeros(x.shape[0], nh).to(device=x.device)
        res = []
        for i in range(x.shape[1]):
            h = h + self.i_h(x[:,i])
            h = F.relu(self.h_h(h))
            res.append(self.h_o(self.bn(h)))
        return torch.stack(res, dim=1)

learn = Learner(data, Model2(), metrics=accuracy)

Maintain state

class Model3(nn.Module):
    def __init__(self):
        super().__init__()
        self.i_h = nn.Embedding(nv,nh)
        self.h_h = nn.Linear(nh,nh)
        self.h_o = nn.Linear(nh,nv)
        self.bn = nn.BatchNorm1d(nh)
        self.h = torch.zeros(bs, nh).cuda()
        
    def forward(self, x):
        res = []
        h = self.h
        for i in range(x.shape[1]):
            h = h + self.i_h(x[:,i])
            h = F.relu(self.h_h(h))
            res.append(self.bn(h))
        self.h = h.detach()
        res = torch.stack(res, dim=1)
        res = self.h_o(res)
        return res

learn = Learner(data, Model3(), metrics=accuracy)

learn.fit_one_cycle(20, 3e-3)

Stack RNN's

class Model4(nn.Module):
    def __init__(self):
        super().__init__()
        self.i_h = nn.Embedding(nv,nh)
        self.rnn = nn.RNN(nh,nh, batch_first=True)
        self.h_o = nn.Linear(nh,nv)
        self.bn = BatchNorm1dFlat(nh)
        self.h = torch.zeros(1, bs, nh).cuda()
        
    def forward(self, x):
        res,h = self.rnn(self.i_h(x), self.h)
        self.h = h.detach()
        return self.h_o(self.bn(res))

learn = Learner(data, Model4(), metrics=accuracy)

learn.fit_one_cycle(20, 3e-3)

GRU/LSTM

Way to do some kind of drop out.

class Model5(nn.Module):
    def __init__(self):
        super().__init__()
        self.i_h = nn.Embedding(nv,nh)
        self.rnn = nn.GRU(nh, nh, 2, batch_first=True)
        self.h_o = nn.Linear(nh,nv)
        self.bn = BatchNorm1dFlat(nh)
        self.h = torch.zeros(2, bs, nh).cuda()
        
    def forward(self, x):
        res,h = self.rnn(self.i_h(x), self.h)
        self.h = h.detach()
        return self.h_o(self.bn(res))

learn = Learner(data, Model5(), metrics=accuracy)

learn.fit_one_cycle(10, 1e-2)

Can also use for sequence labeling.

Document and test code

https://forums.fast.ai/t/dev-projects-index/29849

See also

Visualizing the Loss Landscape of Neural Nets

https://github.com/vdumoulin/conv_arithmetic

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

The Future of Software Intelligence: a Fireside Chat

https://github.com/stas00/ipyexperiments/