October 2019 - December 2019
https://www.fast.ai/2019/01/24/course-v3/
https://forums.fast.ai/t/faq-resources-and-official-course-updates/27934
A blog post on what you need for deep learning: https://www.fast.ai/2017/11/16/what-you-need/
Seven lessons, each around 2 hours long, and you should plan to spend about 10 hours on assignments for each lesson.
Notes on how to setup the couse in GCP here: https://course.fast.ai/start_gcp.html
Notes on how to setup the course in Azure here: https://course.fast.ai/start_azure.html
There forum for the course in here: https://forums.fast.ai/c/part1-v3
The key applications covered are:
The course uses pytorch and the fastai wrapper.
The videos can be found in a YouTube playlist.
ideas taken from https://www.gse.harvard.edu/news/uk/09/01/education-bat-seven-principles-educators
Lesson 1 - Image classification
Recognize pet breeds.
Use of transfer learning.
Set the most important hyper-parameter when training neural networks: the learning rate, using Leslie Smith’s fantastic learning rate finder method.
Features that fastai provides for allowing you to easily add labels to your images.
Lesson 2 - Data cleaning and production; SGD from scratch
Put a model in production e.g. https://course.fast.ai/deployment_render.html
Using the model to find and fix mislabeled or incorrectly-collected images.
Create a model and our own gradient descent loop.
Lesson 3 - Data blocks; Multi-label classification; Segmentation
Use the Planet dataset (https://www.kaggle.com/c/planet-understanding-the-amazon-from-space)
Use the data block API to get the data into shape (more info here).
image segmentation - process of labeling every pixel in an image with a category that shows what kind of object is portrayed by that pixel.
Use CamVid dataset
Predict face keypoints (interesting areas)
Lesson 4 - NLP; Tabular data; Collaborative filtering; Embeddings
Predict whether a movie review is positive or negative using ULMFiT. Here's a popular science article on the model
Cover tabular data (such as spreadsheets and database tables). Work with the fastai.tabular module to set up and train a model.
Collaborative filtering (recommendation systems).
An “embedding” is simply a computational shortcut for a particular type of matrix multiplication (a multiplication by a one-hot encoded matrix; e.g. word vectors).
Lesson 5 - Back propagation; Accelerated SGD; Neural net from scratch
Create a simple NN from scratch.
Look inside the weights of an embedding layer, to find out what our model has learned about our categorical variables.
Although embeddings are most widely known in the context of word embeddings for NLP, they are at least as important for categorical variables in general, such as for tabular data or collaborative filtering.
Lesson 6 - Regularization; Convolutions; Data ethics
Discuss some powerful techniques for improving training and avoiding over-fitting:
Learn all about convolutions, which can be thought of as a variant of matrix multiplication with tied weights, and are the operation at the heart of modern computer vision models (and, increasingly, other types of models too).
Create a class activated map, which is a heat-map that shows which parts of an image were most important in making a prediction.
Learn about some of the ways in which models can go wrong, with a particular focus on feedback loops, why they cause problems, and how to avoid them.
ways in which bias in data can lead to biased algorithms
discuss questions that data scientists can and should be asking to help ensure that their work doesn’t lead to unexpected negative outcomes
Lesson 7 - Resnets from scratch; U-net; Generative (adversarial) networks
One of the most important techniques in modern architectures: the skip connection. most famously used in the resnet, which is the architecture we’ve used throughout this course for image classification
look at the U-net architecture, which uses a different type of skip connection to greatly improve segmentation results (and also for similar tasks where the output structure is similar to the input).
Use the U-net architecture to train a super-resolution model. This is a model which can increase the resolution of a low-quality image. Our model won’t only increase resolution—it will also remove jpeg artifacts and unwanted text watermarks.
In order to make our model produce high quality results, we will need to create a custom loss function which incorporates feature loss (also known as perceptual loss), along with gram loss. These techniques can be used for many other types of image generation task, such as image colorization.
Learn about a recent loss function known as generative adversarial loss (used in generative adversarial networks, or GANs), which can improve the quality of generative models in some contexts, at the cost of speed.
Train GANs more quickly and reliably than standard approaches, by leveraging transfer learning.
Combines architectural innovations and loss function approaches that haven’t been used in this way before.
Learn how to create a recurrent neural net (RNN) from scratch. They are a simple refactoring of a regular multi-layer network.
https://docs.fast.ai/#Installation-and-updating
$ conda create -n fastai python=3.7$ conda activate fastai$ conda install -c pytorch -c fastai fastai$ python>>> from fastai.vision import *>>> path = untar_data(URLs.MNIST_SAMPLE)>>> data = ImageDataBunch.from_folder(path)>>> learn = cnn_learner(data, models.resnet18, metrics=accuracy)>>> learn.fit(1)Here is a PyTorch tutorial:
Download the MNIST data:
from pathlib import Pathimport requestsDATA_PATH = Path("data")PATH = DATA_PATH / "mnist"PATH.mkdir(parents=True, exist_ok=True)URL = "http://deeplearning.net/data/mnist/"FILENAME = "mnist.pkl.gz"if not (PATH / FILENAME).exists(): content = requests.get(URL + FILENAME).content (PATH / FILENAME).open("wb").write(content)This dataset is in numpy array format, and has been stored using pickle, a python-specific format for serializing data.
import pickleimport gzipwith gzip.open((PATH / FILENAME).as_posix(), "rb") as f: ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding="latin-1")Each image is 28 x 28, and is being stored as a flattened row of length 784 (=28x28). Let’s take a look at one; we need to reshape it to 2d first.
from matplotlib import pyplotimport numpy as nppyplot.imshow(x_train[0].reshape((28, 28)), cmap="gray")print(x_train.shape)Convert to torch.tensor
import torchx_train, y_train, x_valid, y_valid = map( torch.tensor, (x_train, y_train, x_valid, y_valid))n, c = x_train.shapex_train, x_train.shape, y_train.min(), y_train.max()print(x_train, y_train)print(x_train.shape)print(y_train.min(), y_train.max())Neural net from scratch. PyTorch provides methods to create random or zero-filled tensors, which we will use to create our weights and bias for a simple linear model. tell PyTorch that they require a gradient. This causes PyTorch to record all of the operations done on the tensor, so that it can calculate the gradient during back-propagation automatically!
For the weights, we set requires_grad after the initialization, since we don’t want that step included in the gradient. (Note that a trailling _ in PyTorch signifies that the operation is performed in-place.)
Initialize weights with Xavier initialisation (by multiplying with 1/sqrt(n)).
import mathweights = torch.randn(784, 10) / math.sqrt(784)weights.requires_grad_()bias = torch.zeros(10, requires_grad=True)we can use any standard Python function (or callable object) as a model! So let’s just write a plain matrix multiplication and broadcasted addition to create a simple linear model. we’ll write log_softmax and use it. PyTorch will even create fast GPU or vectorized CPU code for your function automatically. the @ stands for the dot product operation. softmax.
def log_softmax(x): return x - x.exp().sum(-1).log().unsqueeze(-1)def model(xb): return log_softmax(xb @ weights + bias)call our function on one batch of data (in this case, 64 images). This is one forward pass. Note that our predictions won’t be any better than random at this stage, since we start with random weights.
bs = 64 # batch sizexb = x_train[0:bs] # a mini-batch from xpreds = model(xb) # predictionspreds[0], preds.shapeprint(preds[0], preds.shape)the preds tensor contains not only the tensor values, but also a gradient function. We’ll use this later to do backprop.
implement negative log-likelihood to use as the loss function
def nll(input, target): return -input[range(target.shape[0]), target].mean()loss_func = nllcheck our loss with our random model, so we can see if we improve after a backprop pass later.
yb = y_train[0:bs]print(loss_func(preds, yb))Calculate the accuracy of our model. For each prediction, if the index with the largest value matches the target value, then the prediction was correct.
def accuracy(out, yb): preds = torch.argmax(out, dim=1) return (preds == yb).float().mean()check the accuracy of our random model, so we can see if our accuracy improves as our loss improves.
print(accuracy(preds, yb))run a training loop. For each iteration:
bs)loss.backward() updates the gradients of the model, in this case, weights and biasuse these gradients to update the weights and bias. We do this within the torch.no_grad() context manager, because we do not want these actions to be recorded for our next calculation of the gradient. You can read more about how PyTorch's Autograd records operations here
set the gradients to zero, so that we are ready for the next loop. Otherwise, our gradients would record a running tally of all the operations that had happened (i.e. loss.backward() adds the gradients to whatever is already stored, rather than replacing them).
You can use the standard python debugger to step through PyTorch code, allowing you to check the various variable values at each step. Uncomment set_trace() below to try it out.
from IPython.core.debugger import set_tracelr = 0.5 # learning rateepochs = 2 # how many epochs to train forfor epoch in range(epochs): for i in range((n - 1) // bs + 1): # set_trace() start_i = i * bs end_i = start_i + bs xb = x_train[start_i:end_i] yb = y_train[start_i:end_i] pred = model(xb) loss = loss_func(pred, yb) loss.backward() with torch.no_grad(): weights -= weights.grad * lr bias -= bias.grad * lr weights.grad.zero_() bias.grad.zero_()That’s it: we’ve created and trained a minimal neural network (in this case, a logistic regression, since we have no hidden layers) entirely from scratch!
Let’s check the loss and accuracy and compare those to what we got earlier. We expect that the loss will have decreased and accuracy to have increased.
print(loss_func(model(xb), yb), accuracy(model(xb), yb))Refactor our code, so that it does the same thing as before, only we'll start taking advantage of PyTorch's nn classes to make it more concise and flexible. At each step from here, we should be making our code one or more of: shorter, more understandable, and/or more flexible.
make our code shorter by replacing our hand-written activation and loss functions with those from torch.nn.functional (which is generally imported into the namespace F by convention). This module contains all the functions in the torch.nnlibrary (whereas other parts of the library contain classes). As well as a wide range of loss and activation functions, you'll also find here some convenient functions for creating neural nets, such as pooling functions. (There are also functions for doing convolutions, linear layers, etc, but as we'll see, these are usually better handled using other parts of the library.)
Pytorch provides a single function F.cross_entropy that combines negative log likelihood loss and log softmax activation
import torch.nn.functional as Floss_func = F.cross_entropydef model(xb): return xb @ weights + biasNo longer call log_softmax in the model function. Let's confirm that our loss and accuracy are the same as before:
print(loss_func(model(xb), yb), accuracy(model(xb), yb))use nn.Module and nn.Parameter, for a clearer and more concise training loop. subclass nn.Module (which itself is a class and able to keep track of state). In this case, we want to create a class that holds our weights, bias, and method for the forward step. nn.Module has a number of attributes and methods (such as .parameters() and .zero_grad()) which we will be using.
from torch import nnclass Mnist_Logistic(nn.Module): def __init__(self): super().__init__() self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784)) self.bias = nn.Parameter(torch.zeros(10)) def forward(self, xb): return xb @ self.weights + self.biasNow using an object instead of just using a function, we first have to instantiate our model:
model = Mnist_Logistic()calculate the loss in the same way as before. Note that nn.Module objects are used as if they are functions (i.e they are callable), but behind the scenes Pytorch will call our forward method automatically.
print(loss_func(model(xb), yb))Previously for our training loop we had to update the values for each parameter by name, and manually zero out the grads for each parameter separately. Now we can take advantage of model.parameters() and model.zero_grad() (which are both defined by PyTorch for nn.Module) to make those steps more concise and less prone to the error of forgetting some of our parameters, particularly if we had a more complicated model:
with torch.no_grad(): for p in model.parameters(): p -= p.grad * lr model.zero_grad()We’ll wrap our little training loop in a fit function so we can run it again later.
def fit(): for epoch in range(epochs): for i in range((n - 1) // bs + 1): start_i = i * bs # bs is batch size end_i = start_i + bs xb = x_train[start_i:end_i] yb = y_train[start_i:end_i] pred = model(xb) loss = loss_func(pred, yb) loss.backward() with torch.no_grad(): for p in model.parameters(): p -= p.grad * lr model.zero_grad()fit()Double-check that our loss has gone down:
print(loss_func(model(xb), yb))We continue to refactor our code. Instead of manually defining and initializing self.weights and self.bias, and calculating xb @ self.weights + self.bias, we will instead use the Pytorch class nn.Linear for a linear layer, which does all that for us. Pytorch has many types of predefined layers that can greatly simplify our code, and often makes it faster too.
class Mnist_Logistic(nn.Module): def __init__(self): super().__init__() self.lin = nn.Linear(784, 10) def forward(self, xb): return self.lin(xb)We instantiate our model and calculate the loss in the same way as before:
model = Mnist_Logistic()print(loss_func(model(xb), yb))still able to use our same fit method as before.
fit()print(loss_func(model(xb), yb))Pytorch also has a package with various optimization algorithms, torch.optim. We can use the step method from our optimizer to take a forward step, instead of manually updating each parameter.
This will let us replace our previous manually coded optimization step and instead use just:
opt.step()opt.zero_grad()optim.zero_grad() resets the gradient to 0 and we need to call it before computing the gradient for the next minibatch.
from torch import optimdefine a little function to create our model and optimizer so we can reuse it in the future
def get_model(): model = Mnist_Logistic() return model, optim.SGD(model.parameters(), lr=lr) # lr is learning ratemodel, opt = get_model()print(loss_func(model(xb), yb))for epoch in range(epochs): for i in range((n - 1) // bs + 1): start_i = i * bs end_i = start_i + bs xb = x_train[start_i:end_i] yb = y_train[start_i:end_i] pred = model(xb) loss = loss_func(pred, yb) loss.backward() opt.step() opt.zero_grad()print(loss_func(model(xb), yb))PyTorch has an abstract Dataset class. A Dataset can be anything that has a __len__ function (called by Python’s standard lenfunction) and a __getitem__ function as a way of indexing into it. This tutorial walks through a nice example of creating a custom FacialLandmarkDataset class as a subclass of Dataset.
PyTorch’s TensorDataset is a Dataset wrapping tensors. By defining a length and way of indexing, this also gives us a way to iterate, index, and slice along the first dimension of a tensor. This will make it easier to access both the independent and dependent variables in the same line as we train.
from torch.utils.data import TensorDatasetBoth x_train and y_train can be combined in a single TensorDataset, which will be easier to iterate over and slice.
train_ds = TensorDataset(x_train, y_train)Previously, we had to iterate through minibatches of x and y values separately. Now, we can do these two steps together:
xb,yb = train_ds[i*bs : i*bs+bs]
model, opt = get_model()for epoch in range(epochs): for i in range((n - 1) // bs + 1): xb, yb = train_ds[i * bs: i * bs + bs] pred = model(xb) loss = loss_func(pred, yb) loss.backward() opt.step() opt.zero_grad()print(loss_func(model(xb), yb))Pytorch’s DataLoader is responsible for managing batches. You can create a DataLoader from any Dataset. DataLoader makes it easier to iterate over batches. Rather than having to use train_ds[i*bs : i*bs+bs], the DataLoader gives us each minibatch automatically.
from torch.utils.data import DataLoadertrain_ds = TensorDataset(x_train, y_train)train_dl = DataLoader(train_ds, batch_size=bs)Now, our loop is much cleaner, as (xb, yb) are loaded automatically from the data loader:
for xb,yb in train_dl: pred = model(xb)
model, opt = get_model()for epoch in range(epochs): for xb, yb in train_dl: pred = model(xb) loss = loss_func(pred, yb) loss.backward() opt.step() opt.zero_grad()print(loss_func(model(xb), yb))Thanks to Pytorch’s nn.Module, nn.Parameter, Dataset, and DataLoader, our training loop is now dramatically smaller and easier to understand. Let’s now try to add the basic features necessary to create effecive models in practice.
In section 1, we were just trying to get a reasonable training loop set up for use on our training data. In reality, you always should also have a validation set, in order to identify if you are overfitting.
Shuffling the training data is important to prevent correlation between batches and overfitting. On the other hand, the validation loss will be identical whether we shuffle the validation set or not. Since shuffling takes extra time, it makes no sense to shuffle the validation data.
We’ll use a batch size for the validation set that is twice as large as that for the training set. This is because the validation set does not need backpropagation and thus takes less memory (it doesn’t need to store the gradients). We take advantage of this to use a larger batch size and compute the loss more quickly.
train_ds = TensorDataset(x_train, y_train)train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)valid_ds = TensorDataset(x_valid, y_valid)valid_dl = DataLoader(valid_ds, batch_size=bs * 2)calculate and print the validation loss at the end of each epoch.
(Note that we always call model.train() before training, and model.eval() before inference, because these are used by layers such as nn.BatchNorm2d and nn.Dropout to ensure appropriate behaviour for these different phases.)
model, opt = get_model()for epoch in range(epochs): model.train() for xb, yb in train_dl: pred = model(xb) loss = loss_func(pred, yb) loss.backward() opt.step() opt.zero_grad() model.eval() with torch.no_grad(): valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl) print(epoch, valid_loss / len(valid_dl))Since we go through a similar process twice of calculating the loss for both the training set and the validation set, let’s make that into its own function, loss_batch, which computes the loss for one batch.
pass an optimizer in for the training set, and use it to perform backprop. For the validation set, we don’t pass an optimizer, so the method doesn’t perform backprop.
def loss_batch(model, loss_func, xb, yb, opt=None): loss = loss_func(model(xb), yb) if opt is not None: loss.backward() opt.step() opt.zero_grad() return loss.item(), len(xb)fit runs the necessary operations to train our model and compute the training and validation losses for each epoch.
import numpy as npdef fit(epochs, model, loss_func, opt, train_dl, valid_dl): for epoch in range(epochs): model.train() for xb, yb in train_dl: loss_batch(model, loss_func, xb, yb, opt) model.eval() with torch.no_grad(): losses, nums = zip( *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl] ) val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums) print(epoch, val_loss)get_data returns dataloaders for the training and validation sets.
def get_data(train_ds, valid_ds, bs): return ( DataLoader(train_ds, batch_size=bs, shuffle=True), DataLoader(valid_ds, batch_size=bs * 2), )Now our whole process of obtaining the data loaders and fitting the model can be run in 3 lines of code:
train_dl, valid_dl = get_data(train_ds, valid_ds, bs)model, opt = get_model()fit(epochs, model, loss_func, opt, train_dl, valid_dl)You can use these basic 3 lines of code to train a wide variety of models. Let’s see if we can use them to train a convolutional neural network (CNN)!
We are now going to build our neural network with three convolutional layers. Because none of the functions in the previous section assume anything about the model form, we’ll be able to use them to train a CNN without any modification.
We will use Pytorch’s predefined Conv2d class as our convolutional layer. We define a CNN with 3 convolutional layers. Each convolution is followed by a ReLU. At the end, we perform an average pooling. (Note that view is PyTorch’s version of numpy’s reshape)
class Mnist_CNN(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1) self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1) self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1) def forward(self, xb): xb = xb.view(-1, 1, 28, 28) xb = F.relu(self.conv1(xb)) xb = F.relu(self.conv2(xb)) xb = F.relu(self.conv3(xb)) xb = F.avg_pool2d(xb, 4) return xb.view(-1, xb.size(1))lr = 0.1Momentum is a variation on stochastic gradient descent that takes previous updates into account as well and generally leads to faster training.
model = Mnist_CNN()opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)fit(epochs, model, loss_func, opt, train_dl, valid_dl)torch.nn has another handy class we can use to simply our code: Sequential . A Sequential object runs each of the modules contained within it, in a sequential manner. This is a simpler way of writing our neural network.
To take advantage of this, we need to be able to easily define a custom layer from a given function. For instance, PyTorch doesn’t have a view layer, and we need to create one for our network. Lambda will create a layer that we can then use when defining a network with Sequential.
class Lambda(nn.Module): def __init__(self, func): super().__init__() self.func = func def forward(self, x): return self.func(x)def preprocess(x): return x.view(-1, 1, 28, 28)The model created with Sequential is:
model = nn.Sequential( Lambda(preprocess), nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1), nn.ReLU(), nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1), nn.ReLU(), nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1), nn.ReLU(), nn.AvgPool2d(4), Lambda(lambda x: x.view(x.size(0), -1)),)opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)fit(epochs, model, loss_func, opt, train_dl, valid_dl)Our CNN is fairly concise, but it only works with MNIST, because:
Let’s get rid of these two assumptions, so our model works with any 2d single channel image. First, we can remove the initial Lambda layer but moving the data preprocessing into a generator:
def preprocess(x, y): return x.view(-1, 1, 28, 28), yclass WrappedDataLoader: def __init__(self, dl, func): self.dl = dl self.func = func def __len__(self): return len(self.dl) def __iter__(self): batches = iter(self.dl) for b in batches: yield (self.func(*b))train_dl, valid_dl = get_data(train_ds, valid_ds, bs)train_dl = WrappedDataLoader(train_dl, preprocess)valid_dl = WrappedDataLoader(valid_dl, preprocess)replace nn.AvgPool2d with nn.AdaptiveAvgPool2d, which allows us to define the size of the output tensor we want, rather than the input tensor we have. As a result, our model will work with any size input.
model = nn.Sequential( nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1), nn.ReLU(), nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1), nn.ReLU(), nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1), nn.ReLU(), nn.AdaptiveAvgPool2d(1), Lambda(lambda x: x.view(x.size(0), -1)),)opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
fit(epochs, model, loss_func, opt, train_dl, valid_dl)check that your GPU is working in Pytorch:
print(torch.cuda.is_available())create a device object for it:
dev = torch.device( "cuda") if torch.cuda.is_available() else torch.device("cpu")Update preprocess to move batches to the GPU:
def preprocess(x, y): return x.view(-1, 1, 28, 28).to(dev), y.to(dev)train_dl, valid_dl = get_data(train_ds, valid_ds, bs)train_dl = WrappedDataLoader(train_dl, preprocess)valid_dl = WrappedDataLoader(valid_dl, preprocess)move our model to the GPU
model.to(dev)opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)It runs faster now:
fit(epochs, model, loss_func, opt, train_dl, valid_dl)Module: creates a callable which behaves like a function, but can also contain state(such as neural net layer weights). It knows what Parameter(s) it contains and can zero all their gradients, loop through them for weight updates, etc.Parameter: a wrapper for a tensor that tells a Module that it has weights that need updating during backprop. Only tensors with the requires_gradattribute set are updatedfunctional: a module(usually imported into the F namespace by convention) which contains activation functions, loss functions, etc, as well as non-stateful versions of layers such as convolutional and linear layers.torch.optim: Contains optimizers such as SGD, which update the weights of Parameter during the backward stepDataset: An abstract interface of objects with a __len__ and a __getitem__, including classes provided with Pytorch such as TensorDatasetDataLoader: Takes any Dataset and creates an iterator which returns batches of data.Get lessons from https://github.com/fastai/course-v3
Course taught by Jeremy Howard GitHub
Make deep learning accessible: software, education, research, community
Some back ground on why jupyter notebook https://paulromer.net/jupyter-mathematica-and-the-future-of-the-research-paper/
The possibility of Jupyter notebooks being used for papers https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/
Esc to enter edit mode and Enter to enter command mode
Your notebook is autosaved every 120 seconds
Esc -> s to save
Esc -> up arrow to toggle cells
Esc -> b to create a new cell
Esc -> 0 -> 0 to restart kernel
Esc -> m to convert a cell to markdown
Esc -> y to convert a cell to code
Esc -> d -> d to delete a cell
Esc -> o to hide output
?function-name: Shows the definition and docstring for that function
??function-name: Shows the source code for that function
doc(function-name): Shows the definition, docstring and links to the documentation of the function (only works with fastai library imported)
Use Tab to autocomplete method and Shift + Tab to show the input to that method.
https://course.fast.ai/videos/?lesson=1
Build our first image classifier from scratch.
The fastai library provides many useful functions that enable us to quickly and easily build neural networks and train our models.
Argue to use import * for testing.
from fastai.vision import *from fastai.metrics import error_rateKnown datasets. Academic datasets are often in good shape and used frequently. Provide baselines so you know if your model is good.
The dataset is Oxford-IIIT Pet Dataset by Parkhi et al 2012 which features 12 cat breeds and 25 dog breeds (fine grain classification). The best accuracy they got in their dataset was 59.21%.
Download and extract the data.
Use untar_data which requires Union (meaning other) pathlib.Path or str.
path = untar_data(URLs.PETS)path.ls()path_anno = path/'annotations' # can use / as path objectpath_img = path/'images'fnames = get_image_files(path_img)pat = r'/([^/]+)_\d+.jpg$' # regular expression patternsdata = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats) # specify size of images to scale to using get_transforms()data.show_batch(rows=3, figsize=(7,6)) # grab middle bit and resize it# Get unique classesprint(data.classes)len(data.classes),data.cModel trained using a Learner. Sub class such as cnn_learner
Use a convolutional neural network backbone and a fully connected head with a single hidden layer as a classifier. Use resnet34. Trained of 1,000 classes on 1.5m images (ImageNet). Download pre-trained weights (transfer learning). Much quicker to train and don't need a big dataset.
learn = cnn_learner(data, models.resnet34, metrics=error_rate) # Requires ImageDataBunch and model architecture. resnet work pretty there's 34 and 50 (start smaller: 34). Print out error_rate.learn.modellearn.fit_one_cycle(4) # 4 epochs on last few layerslearn.save('stage-1')You can also do resnet50 but don't forget to reduce the batch size.
https://dawn.cs.stanford.edu/benchmark/#imagenet - shows scores of image classification models.
To see what comes out of the model:
learn knows data and model
interp = ClassificationInterpretation.from_learner(learn)losses,idxs = interp.top_losses()interp.plot_top_losses(9, figsize=(15,11))plot 4 things: prediction, actual, loss, and probability of actual class
Confusion matrix: for actual class how many times was it predicted to be that class
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)If you have lots of classes best to use
interp.most_confused(min_val=2)unfreeze our model and train the whole model
Zeiler and Furgus 2013 understand and visualizing cnn's.
We want to change the last layers which are specific features
learn.unfreeze()learn.fit_one_cycle(1)Leads to worse error as it is updating all layers e.g. diagonals.
learn.lr_find()learn.recorder.plot()Default lr is 0.003 which is were the loss increases a lot
learn.unfreeze()learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4)) # this is dynamic for each layer between these two valuesThis gives a 10% increase in accuracy that before. You only really need these two layers.
resnet paper: https://arxiv.org/abs/1311.2901
bs = 16data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=299, bs=bs//2).normalize(imagenet_stats)learn = cnn_learner(data, models.resnet50, metrics=error_rate)learn.lr_find()learn.recorder.plot()learn.fit_one_cycle(8)learn.save('stage-1-50')learn.unfreeze()learn.fit_one_cycle(3, max_lr=slice(1e-6,1e-4))learn.load('stage-1-50');interp = ClassificationInterpretation.from_learner(learn)interp.most_confused(min_val=2)path = untar_data(URLs.MNIST_SAMPLE); pathtfms = get_transforms(do_flip=False)The labels are what the folders are called so you can use from_folder
data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=26)data.show_batch(rows=3, figsize=(5,5))learn = cnn_learner(data, models.resnet18, metrics=accuracy)learn.fit(2)
df = pd.read_csv(path/'labels.csv')df.head()if label is in csv file can use:
data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=28)from df
data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)data.classesgrab label using regex
pat = r"/(\d)/\d+\.png$"data = ImageDataBunch.from_name_re(path, fn_paths, pat=pat, ds_tfms=tfms, size=24)data.classescan write your own function
data = ImageDataBunch.from_name_func(path, fn_paths, ds_tfms=tfms, size=24, label_func = lambda x: '3' if '/3/' in str(x) else '7')data.classesfrom lists
labels = [('3' if '/3/' in str(x) else '7') for x in fn_paths]labels[:5]data = ImageDataBunch.from_lists(path, fn_paths, labels=labels, ds_tfms=tfms, size=24)data.classeshttps://forums.fast.ai/t/tips-for-building-large-image-datasets/26688/3
https://lpdaacsvc.cr.usgs.gov/appeears/ - to get MERIS, ESA Sentinel-3
https://github.com/prairie-guy/ai_utilities
https://github.com/svenski/duckgoose
https://forums.fast.ai/t/generating-image-datasets-quickly/19079/9
https://forums.fast.ai/t/how-to-scrape-the-web-for-images/7446/3
Inspired by https://www.pyimagesearch.com/2017/12/04/how-to-create-a-deep-learning-dataset-using-google-images/
from fastai.vision import *Create folders for the files
path = Path('data/bears')folder = 'black'dest = path/folderdest.mkdir(parents=True, exist_ok=True)folder = 'teddys'dest = path/folderdest.mkdir(parents=True, exist_ok=True)folder = 'grizzly'dest = path/folderdest.mkdir(parents=True, exist_ok=True)Go to Google Images and type your search time. If you want to exclude anything you can do "canis lupus lupus" -dog to search for wolves for example. Tools -> Type -> Photo to only get photos. Type "black bear"
Run some Javascript code in your browser which will save the URLs of all the images you want for you dataset.
Turn off ad-block. and press Ctrl + Shift + J in your browser and run (press Enter to run)
urls = Array.from(document.querySelectorAll('.rg_di .rg_meta')).map(el=>JSON.parse(el.textContent).ou);window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));Move the 'download' file into the bear folder and name the file urls_black.csv
Do the same for teddys and grizzly
Download the files
path = Path('data/bears')file = 'urls_black.csv'folder = 'black'dest = path/folderdownload_images(path/file, dest, max_pics=200)file = 'urls_teddys.csv'folder = 'teddys'dest = path/folderdownload_images(path/file, dest, max_pics=200)file = 'urls_grizzly.csv'folder = 'grizzly'dest = path/folderdownload_images(path/file, dest, max_pics=200)Then clean up files that can't be opened:
classes = ['teddys','grizzly','black']for c in classes: print(c) verify_images(path/c, delete=True, max_size=500)May have to go through this a couple of times to get all the images.
View the data:
np.random.seed(42)data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)data.classesdata.show_batch(rows=3, figsize=(7,8))Train the model:
learn = cnn_learner(data, models.resnet34, metrics=error_rate)learn.fit_one_cycle(4)learn.save('stage-1')learn.unfreeze()learn.lr_find()learn.recorder.plot()learn.fit_one_cycle(2, max_lr=slice(3e-5,3e-4))learn.save('stage-2')Interpret the model:
learn.load('stage-2');interp = ClassificationInterpretation.from_learner(learn)interp.plot_confusion_matrix()Cleaning up the data:
Using the ImageCleaner widget from fastai.widgets we can prune our top losses, removing photos that don't belong.
from fastai.widgets import * get the file paths from our top_losses. Use .from_toplosses. Feed the top losses indexes and corresponding dataset to ImageCleaner
The widget will not delete images directly from disk but it will create a new csv file cleaned.csv from where you can create a new ImageDataBunch with the corrected labels to continue training your model.
https://ipywidgets.readthedocs.io/en/latest/
create a new dataset without the split.
db = (ImageList.from_folder(path) .split_none() .label_from_folder() .transform(get_transforms(), size=224) .databunch() )Create a new learner to use our new databunch with all the images.
learn_cln = cnn_learner(db, models.resnet34, metrics=error_rate)learn_cln.load('stage-2');ds, idxs = DatasetFormatter().from_toplosses(learn_cln)ImageCleaner(ds, idxs, path)Flag photos for deletion by clicking 'Delete'
Find duplicates in your dataset and delete them! run .from_similars to get the potential duplicates' ids and then run ImageCleaner with duplicates=True
db = (ImageList.from_csv(path, 'cleaned.csv', folder='.') .split_none() .label_from_df() .transform(get_transforms(), size=224) .databunch() )learn_cln = cnn_learner(db, models.resnet34, metrics=error_rate)learn.load('stage-2');ds, idxs = DatasetFormatter().from_similars(learn_cln)ImageCleaner(ds, idxs, path, duplicates=True)db = (ImageList.from_csv(path, 'cleaned.csv', folder='.') .split_none() .label_from_df() .transform(get_transforms(), size=224) .databunch() )learn = cnn_learner(db, models.resnet34, metrics=error_rate)Put the model into production
Export the content of our Learner object for production
learn.export()This will create a file named 'export.pkl' in the directory where we were working that contains everything we need to deploy our model (the model, the weights but also some metadata like the classes or the transforms/normalization used).
It is better to use a CPU than a GPU for scaling and because unlike a GPU it won't have to wait to build up a batch up 64 images before running hence making users wait.
Test your model on CPU like so:
defaults.device = torch.device('cpu')img = open_image(path/'black'/'00000021.jpg')imglearn = load_learner(path)pred_class,pred_idx,outputs = learn.predict(img)pred_class$ pip install starlette$ pip install uvicornCreate a file called hello_world.py
from starlette.applications import Starlettefrom starlette.responses import JSONResponseimport uvicornapp = Starlette(debug=True)@app.route('/')async def homepage(request): return JSONResponse({'hello': 'world'})if __name__ == '__main__': uvicorn.run(app, host='0.0.0.0', port=8000)
$ git clone https://github.com/encode/starlette-example.git$ cd starlette-example$ pip install aiofiles$ scripts/runhttps://github.com/simonw/cougar-or-not/blob/master/cougar.py
You can play with learning rate and number of batches
Learning rate too high:
learn = cnn_learner(data, models.resnet34, metrics=error_rate)learn.fit_one_cycle(1, max_lr=0.5)error = 0.69Your validation loss will explode.
Learning rate too low:
learn = cnn_learner(data, models.resnet34, metrics=error_rate)# error rate goes down then back uplearn.fit_one_cycle(5, max_lr=1e-5)learn.recorder.plot_losses()Train loss is higher than validation loss.
Too few epochs
learn = cnn_learner(data, models.resnet34, metrics=error_rate, pretrained=False)learn.fit_one_cycle(1)Training loss is much higher than validation loss.
Too many epochs
np.random.seed(42)data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.9, bs=32, ds_tfms=get_transforms(do_flip=False, max_rotate=0, max_zoom=1, max_lighting=0, max_warp=0 ),size=224, num_workers=4).normalize(imagenet_stats)learn = cnn_learner(data, models.resnet50, metrics=error_rate, ps=0, wd=0)learn.unfreeze()learn.fit_one_cycle(40, slice(1e-6,1e-4))Overfitting. Error rate improves for a while than gets worse again. Model should have train loss lower than validation loss.
Metric on validation dataset.
https://forums.fast.ai/t/lesson-1-official-resources-and-updates/27936
https://course.fast.ai/videos/?lesson=2
Make sure you have the latest version of the code and the latest version of the course
$ conda update conda$ conda update anaconda$ conda activate fastai$ conda update --allCompare https://github.com/fastai/course-v3 to my local course.
cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1jupyter notebookDeep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification - https://arxiv.org/abs/1608.04363
https://forums.fast.ai/t/share-your-work-here/27676/38
http://matrixmultiplication.xyz/
y = a1x1 + a2x2
%matplotlib inlinefrom fastai.basics import *n=100# x1 is noisy and x2 is full of ones.x = torch.ones(n, 2)# _ means replace values (i.e. don't return).x[:,0].uniform_(-1., 1)x[:5]# a1 = 3; a2 = 2a = tensor(3., 2.); a# x@a is a matrix producty = x@a + torch.rand(n)plt.scatter(x[:,0], y);Try and work out a1 and a2.
find parameters (weights) a such that you minimize the error between the points and the line x@a. For a regression problem the most common error function or loss function is the mean squared error.
def mse(y_hat, y): return ((y_hat - y) ** 2).mean()Start with a guess of -1, 1.
a = tensor(-1., 1.)y_hat = x@amse(y_hat, y)plt.scatter(x[:,0], y)plt.scatter(x[:,0], y_hat);Change intercept and gradient. Four possibilities...
Derivative tells you how to move the line of best fit.
Call .grad to get the derivative.
a = nn.Parameter(a); alr = 1e-1def update(lr, a): y_hat = x@a loss = mse(y, y_hat) if t % 10 == 0: print(loss) loss.backward() # calculate gradient # Turn gradient calculation off when you do the SGD update with torch.no_grad(): # substract the gradient from a # _ means inplace # subtract i.e. go in other 'direction' of loss. a.sub_(lr * a.grad) # the derivative gets assigned to .grad a.grad.zero_() # 0 out the gradientfor t in range(100): update(lr, a)plt.scatter(x[:, 0], y)plt.scatter(x[:, 0], x@a);from matplotlib import animation, rcrc('animation', html='jshtml')a = nn.Parameter(tensor(-1., 1.))fig = plt.figure()plt.scatter(x[:, 0], y, c='orange')line, = plt.plot(x[:, 0], x@a)plt.close()def animate(i): update(lr) line.set_ydata(x@a) return line,animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)In practice we use mini-batches.
a + bx (high bias - underfit)
a + bx + cx^2 (just right)
a + bx + cx^2 + dx^3 + ex^4 (high variance - overfit).
You can deploy computer vision models at https://course.fast.ai/deployment_render.html
https://github.com/hiromis/notes/blob/master/Lesson2.md
https://www.youtube.com/watch?v=q6DGVGJ1WP4
https://responder.readthedocs.io/en/latest/
https://www.christianwerner.net/tech/Build-your-image-dataset-faster/
A systematic study of the class imbalance problem in convolutional neural networks - https://arxiv.org/abs/1710.05381
https://www.youtube.com/watch?v=q6DGVGJ1WP4 - There's no such thing as not a math person.
https://www.fast.ai/2017/11/13/validation-sets/ how (and why) to create a good validation set.
https://course.fast.ai/videos/?lesson=3
Make sure you have the latest version of the code and the latest version of the course
$ conda update conda$ conda update anaconda$ conda activate fastai$ conda update --allCompare https://github.com/fastai/course-v3 to my local course.
cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1jupyter notebookMore info on deploying webapps here - https://course.fast.ai/deployment_render.html
Classifiers people made are in - https://github.com/hiromis/notes/blob/master/Lesson3.md
%reload_ext autoreload%autoreload 2%matplotlib inline# https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-planet.ipynbfrom fastai.vision import *Install kaggle
https://www.kaggle.com/c/planet-understanding-the-amazon-from-space
path = Config.data_path()/'planet'path.mkdir(parents=True, exist_ok=True)Download data
$ kaggle competitions download -c planet-understanding-the-amazon-from-space -f train-jpg.tar.7z -p /home/ray/.fastai/data/planet$ kaggle competitions download -c planet-understanding-the-amazon-from-space -f train_v2.csv -p /home/ray/.fastai/data/planet$ unzip -q -n /home/ray/.fastai/data/planet/train_v2.csv.zip -d /home/ray/.fastai/data/planetInstall 7zip and uncompress
$ sudo apt-get update$ sudo apt install p7zip-full$ 7za -bd -y -so x /home/ray/.fastai/data/planet/train-jpg.tar.7z | tar xf - -C /home/ray/.fastai/data/planetMultiple layers for each tile
df = pd.read_csv(path/'train_v2.csv')df.head()Put this into a DataBunch and use ImageList (and not ImageDataBunch). The Dataset class: https://pytorch.org/docs/stable/data.html#map-style-datasets
has __getitem__() e.g. object[3] and __len__() len(object) (dunder (double under)). This provides images and datasets
Create a mini-batch using a DataLoader(dataset). Use DataBunch to bind train DataLoader and valid DataLoader.
np.random.seed(42)src = (ImageList.from_csv(path, 'train_v2.csv', folder='train-jpg', suffix='.jpg') # Get images .split_by_rand_pct(0.2) .label_from_df(label_delim=' ')) # Get labels# Flip vert to true as satellite data. warp (look at from above or below) set to 0 as satellite is from top.tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)data = (src.transform(tfms, size=128) .databunch().normalize(imagenet_stats))data.show_batch(rows=3, figsize=(12,9))Setup CNN and metrics (accuracy (argmax) and f-score). fbeta with beta=2 is equal to F2. Use threshold to keep classes that we think exists in a sample. Use partial function to call a function with the same keywords.
data.cdata.classesarch = models.resnet50acc_02 = partial(accuracy_thresh, thresh=0.2)f_score = partial(fbeta, thresh=0.2)learn = cnn_learner(data, arch, metrics=[acc_02, f_score])learn.lr_find()learn.recorder.plot()lr = 0.01learn.fit_one_cycle(5, slice(lr))learn.save('stage-1-rn50')Fine tune/fit a bit more. You could create a DataBunch with the wrong classified images and fit them (e.g. higher learning rate or more epochs).
learn.unfreeze()learn.lr_find()learn.recorder.plot()learn.fit_one_cycle(5, slice(1e-5, lr / 5))learn.save('stage-2-rn50')This model was fit to images with size 128. Use transfer learning to fit to 256.
data = (src.transform(tfms, size=256) .databunch(bs=32).normalize(imagenet_stats))# Put new data into learnlearn.data = datadata.train_ds[0][0].shapelearn.freeze()learn.lr_find()learn.recorder.plot()Train last two layers.
lr = 1e-2 / 2learn.fit_one_cycle(5, slice(lr))%reload_ext autoreload%autoreload 2%matplotlib inlinefrom fastai.vision import *from fastai.callbacks.hooks import *from fastai.utils.mem import *You can see the datasets at https://course.fast.ai/datasets
path = untar_data(URLs.CAMVID)path.ls()path_lbl = path/'labels'path_img = path/'images'fnames = get_image_files(path_img)fnames[:3]lbl_names = get_image_files(path_lbl)lbl_names[:3]img_f = fnames[0]img = open_image(img_f)img.show(figsize=(5, 5))get_y_fn = lambda x: path_lbl/f'{x.stem}_P{x.suffix}'mask = open_mask(get_y_fn(img_f))mask.show(figsize=(5,5), alpha=1)src_size = np.array(mask.shape[1:])src_size,mask.datacodes = np.loadtxt(path/'codes.txt', dtype=str); codesDatasets
size = src_size//2free = gpu_mem_get_free_no_cache()# the max size of bs depends on the available GPU RAMif free > 8200: bs=8else: bs=4print(f"using bs={bs}, have {free}MB of GPU RAM free")src = (SegmentationItemList.from_folder(path_img) .split_by_fname_file('../valid.txt') .label_from_func(get_y_fn, classes=codes))# Transform y (the independent variable as well)data = (src.transform(get_transforms(), size=size, tfm_y=True) .databunch(bs=bs) .normalize(imagenet_stats))data.show_batch(2, figsize=(10,7))data.show_batch(2, figsize=(10,7), ds_type=DatasetType.Valid)Model
name2id = {v:k for k,v in enumerate(codes)}# Label some pixel's void. Remove these.void_code = name2id['Void']# Custom metricdef acc_camvid(input, target): target = target.squeeze(1) mask = target != void_code return (input.argmax(dim=1)[mask]==target[mask]).float().mean()metrics=acc_camvid# metrics=accuracywd=1e-2For segmentation model
https://docs.fast.ai/vision.learner.html#unet_learner
https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd)lr_find(learn)learn.recorder.plot()lr=3e-3learn.fit_one_cycle(10, slice(lr), pct_start=0.9)learn.save('stage-1')learn.load('stage-1');learn.show_results(rows=3, figsize=(8,9))learn.unfreeze()lrs = slice(lr/400,lr/4)learn.fit_one_cycle(12, lrs, pct_start=0.8)learn.save('stage-2');Go Big
learn.destroy()size = src_sizefree = gpu_mem_get_free_no_cache()# the max size of bs depends on the available GPU RAMif free > 8200: bs=3else: bs=1print(f"using bs={bs}, have {free}MB of GPU RAM free")data = (src.transform(get_transforms(), size=size, tfm_y=True) .databunch(bs=bs) .normalize(imagenet_stats))learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd)learn.load('stage-2');lr_find(learn)learn.recorder.plot()learn.recorder.plot_lr()Increase LR at start and decrease at end.
lr=1e-3learn.fit_one_cycle(10, slice(lr), pct_start=0.8)learn.save('stage-1-big')learn.load('stage-1-big');learn.unfreeze()lrs = slice(1e-6,lr/10)learn.fit_one_cycle(10, lrs)learn.save('stage-2-big')learn.load('stage-2-big');learn.show_results(rows=3, figsize=(10,10))Results are in https://arxiv.org/abs/1611.09326 (The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation).
Find the center of a face (x and y pixels). Regression model.
%reload_ext autoreload%autoreload 2%matplotlib inlinefrom fastai.vision import *path = untar_data(URLs.BIWI_HEAD_POSE)Convert the data
cal = np.genfromtxt(path/'01'/'rgb.cal', skip_footer=6); calfname = '09/frame_00667_rgb.jpg'def img2txt_name(f): return path/f'{str(f)[:-7]}pose.txt'img = open_image(path/fname)img.show()ctr = np.genfromtxt(img2txt_name(fname), skip_header=3); ctrdef convert_biwi(coords): c1 = coords[0] * cal[0][0]/coords[2] + cal[0][2] c2 = coords[1] * cal[1][1]/coords[2] + cal[1][2] return tensor([c2,c1])def get_ctr(f): ctr = np.genfromtxt(img2txt_name(f), skip_header=3) return convert_biwi(ctr)def get_ip(img,pts): return ImagePoints(FlowField(img.size, pts), scale=True)get_ctr(fname)ctr = get_ctr(fname)img.show(y=get_ip(img, ctr), figsize=(6, 6))validation on a person, set of points. transform y = true.
data = (PointsItemList.from_folder(path) .split_by_valid_func(lambda o: o.parent.name=='13') .label_from_func(get_ctr) .transform(get_transforms(), tfm_y=True, size=(120,160)) .databunch().normalize(imagenet_stats) )Train model.
learn = cnn_learner(data, models.resnet34)learn.lr_find()learn.recorder.plot()lr = 2e-2learn.fit_one_cycle(5, slice(lr))learn.save('stage-1')learn.load('stage-1');learn.show_results()from fastai.text import *path = untar_data(URLs.IMDB_SAMPLE)path.ls()Create DataBunch
data_lm = TextDataBunch.from_csv(path, 'texts.csv')data_lm.save()data = load_data(path)The steps that happen here are tokenization. Take the words and convert to token (e.g. word) lemmetize. Replace rare words with unknown. lower case. spaces
data = TextClasDataBunch.from_csv(path, 'texts.csv')data.show_batch()Replace text with number. Use vocab size of 60,000.
data.vocab.itos[:10]data.train_ds[0][0]data.train_ds[0][0].data[:10]Use the data block API
data = (TextList.from_csv(path, 'texts.csv', cols='text') .split_from_df(col=2) .label_from_df(cols=0) .databunch())Create a language model. Use a pre-trained model on https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/ to guess what the next word is. Use transfer learning. ~1 billion tokens. Self supervised learning.
bs=48path = untar_data(URLs.IMDB)path.ls()(path/'train').ls()# lm = 'Language model'data_lm = (TextList.from_folder(path) #Inputs: all the text files in path .filter_by_folder(include=['train', 'test', 'unsup']) #We may have other temp folders that contain text files so we only keep what's in train and test .split_by_rand_pct(0.1) #We randomly split and keep 10% (10,000 reviews) for validation .label_for_lm() #We want to do a language model so we label accordingly .databunch(bs=bs))data_lm.save('data_lm.pkl')Ignore labels and shuffle data
data_lm = load_data(path, 'data_lm.pkl', bs=bs)data_lm.show_batch()# https://docs.fast.ai/text.models.html#AWD_LSTMlearn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)learn.lr_find()learn.recorder.plot(skip_end=15)learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7)) # moms is momentumlearn.save('fit_head')learn.load('fit_head');Fine tune
learn.unfreeze()learn.fit_one_cycle(10, 1e-3, moms=(0.8,0.7))learn.save('fine_tuned')Test model output
learn.load('fine_tuned');TEXT = "I liked this movie because"N_WORDS = 40N_SENTENCES = 2print("\n".join(learn.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)))learn.save_encoder('fine_tuned_enc')Create classifier to predict review
path = untar_data(URLs.IMDB)data_clas = (TextList.from_folder(path, vocab=data_lm.vocab) #grab all the text files in path .split_by_folder(valid='test') #split by train and valid folder (that only keeps 'train' and 'test' so no need to filter) .label_from_folder(classes=['neg', 'pos']) #label them all with their folders .databunch(bs=bs))data_clas.save('data_clas.pkl')data_clas = load_data(path, 'data_clas.pkl', bs=bs)data_clas.show_batch()Create a model to classify those reviews and load the encoder we saved before
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)learn.load_encoder('fine_tuned_enc')learn.lr_find()learn.recorder.plot()learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7))learn.save('first')learn.load('first');learn.freeze_to(-2) # unfreeze last two layers.learn.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2), moms=(0.8,0.7))# two values for slice and how quickly the lowest and highest layers learn.# 2.6 comes from doing a RF on hyperparameters to predict accuracylearn.save('second')learn.load('second');learn.freeze_to(-3) # unfreeze last threelearn.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3), moms=(0.8,0.7))learn.save('third')learn.load('third');learn.unfreeze() # unfreeze the whole thinglearn.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3), moms=(0.8,0.7))learn.predict("I really loved that movie, it was awesome!")https://www.kaggle.com/c/planet-understanding-the-amazon-from-space
https://docs.fast.ai/data_block.html
https://stackoverflow.com/questions/29133085/what-are-keypoints-in-image-processing
https://forums.fast.ai/t/deep-learning-lesson-3-notes/29829
https://github.com/hiromis/notes/blob/master/Lesson3.md
https://forums.fast.ai/t/lesson-3-in-class-discussion/29733
https://forums.fast.ai/t/lesson-3-links-to-different-parts-in-video/30077
https://www.coursera.org/learn/machine-learning
https://course.fast.ai/deployment_render.html
https://mmiakashs.github.io/blog/2018-09-20-kaggle-api-google-colab/
https://docs.python.org/3/library/functools.html#functools.partial
https://zulko.github.io/moviepy/
https://www.meetup.com/sfmachinelearning/events/255566613/
https://docs.fast.ai/vision.transform.html#List-of-transforms
https://arxiv.org/abs/1506.01186 - Cyclical Learning Rates for Training Neural Networks
https://course.fast.ai/videos/?lesson=4
Make sure you have the latest version of the code and the latest version of the course
$ conda update conda$ conda update anaconda$ conda activate fastai$ conda update --allCompare https://github.com/fastai/course-v3 to my local course.
cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1jupyter notebookBasic steps are:
Use NN instead of GBT/RF as less feature engineering.
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson4-tabular.ipynb
from fastai.tabular import *Download the adult dataset https://archive.ics.uci.edu/ml/datasets/adult
path = untar_data(URLs.ADULT_SAMPLE)df = pd.read_csv(path/'adult.csv')dep_var = 'salary'cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']cont_names = ['age', 'fnlwgt', 'education-num']procs = [FillMissing, Categorify, Normalize]test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names, cont_names=cont_names)data = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs) .split_by_idx(list(range(800,1000))) .label_from_df(cols=dep_var) .add_test(test) .databunch())learn = tabular_learner(data, layers=[200, 100], metrics=accuracy)learn.lr_find()learn.recorder.plot()learn.fit(1, 1e-2)Make prediction
row = df.iloc[0]learn.predict(row)You can either have the data as a table e.g. User | movie | number of stars.
Or sparse matrix Users x Movies.
https://grouplens.org/datasets/movielens/
There in an up to date version of the movie lens dataset (https://grouplens.org/datasets/movielens/)
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson4-collab.ipynb
from fastai.collab import *from fastai.tabular import *user,item,title = 'userId','movieId','title'path = untar_data(URLs.ML_SAMPLE)ratings = pd.read_csv(path/'ratings.csv')train a model
data = CollabDataBunch.from_df(ratings, seed=42)y_range = [0, 5.5] # Range of scoreshttps://github.com/fastai/fastai/blob/master/fastai/collab.py#L96
https://github.com/fastai/fastai/blob/c498a576214edc9f7d91e16ef51988f26327e04e/fastai/collab.py#L36
https://pytorch.org/docs/stable/_modules/torch/nn/modules/sparse.html#Embedding
learn = collab_learner(data, n_factors=50, y_range=y_range)learn.fit_one_cycle(3, 5e-3)Movielens 100k
path=Config.data_path()/'ml-100k'ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None, names=[user,item,'rating','timestamp'])ratings.head()movies = pd.read_csv(path/'u.item', delimiter='|', encoding='latin-1', header=None, names=[item, 'title', 'date', 'N', 'url', *[f'g{i}' for i in range(19)]])movies.head()len(ratings)rating_movie = ratings.merge(movies[[item, title]])rating_movie.head()data = CollabDataBunch.from_df(rating_movie, seed=42, valid_pct=0.1, item_name=title)data.show_batch()y_range = [0, 5.5] # input to sigmoid (rating is 0.5, 5) add a bit to make it larger.# factor is width of embedding size.# wd is weight decay (regularization). Sum square of parameters (some are + and some are -) and multiple it by wd.learn = collab_learner(data, n_factors=40, y_range=y_range, wd=1e-1)learn.lr_find()learn.recorder.plot(skip_end=15)learn.fit_one_cycle(5, 5e-3)learn.save('dotprod') Interpret
learn.load('dotprod');learn.modelg = rating_movie.groupby(title)['rating'].count()top_movies = g.sort_values(ascending=False).index.values[:1000]top_movies[:10]Movie bias
movie_bias = learn.bias(top_movies, is_item=True) # is_item=True gives movie and is_item=False gives moviesmovie_bias.shapemean_ratings = rating_movie.groupby(title)['rating'].mean()movie_ratings = [(b, i, mean_ratings.loc[i]) for i,b in zip(top_movies,movie_bias)]item0 = lambda o:o[0]sorted(movie_ratings, key=item0)[:15]sorted(movie_ratings, key=item0, reverse=True)[:15]Movie weights
movie_w = learn.weight(top_movies, is_item=True) movie_w.shape# squish those 40 factors into 4movie_pca = movie_w.pca(3)movie_pca.shapefac0,fac1,fac2 = movie_pca.t()movie_comp = [(f, i) for f,i in zip(fac0, top_movies)]# Some aspect of taste and movie featuresorted(movie_comp, key=itemgetter(0), reverse=True)[:10]sorted(movie_comp, key=itemgetter(0))[:10]movie_comp = [(f, i) for f,i in zip(fac1, top_movies)]sorted(movie_comp, key=itemgetter(0), reverse=True)[:10]sorted(movie_comp, key=itemgetter(0))[:10]# PCA plotidxs = np.random.choice(len(top_movies), 50, replace=False)idxs = list(range(50))X = fac0[idxs]Y = fac2[idxs]plt.figure(figsize=(15,15))plt.scatter(X, Y)for i, x, y in zip(top_movies[idxs], X, Y): plt.text(x,y,i, color=np.random.rand(3)*0.7, fontsize=11)plt.show()Cold start problem - new user and new movie. Need a meta data (e.g. age and sex) model for new users and new models.
Predict user 1 will like movie 1.
Create 5 random number for each movie and 5 random numbers for each user. Then do a dot product get a value. Then update these values to get the matrix of movie ratings. Do RMSE of the matrix.
Embedding - matrix of weights. Bias - how much a user likes movies in general for e.g.
Use sigmoid to restrict output to 0 and 5.
Here is some benchmarking for movie-lens - https://www.librec.net/release/v1.3/example.html
https://www.nytimes.com/2018/11/18/technology/artificial-intelligence-language.html (ULMFiT).
https://forums.fast.ai/t/deep-learning-lesson-4-notes/30983
https://forums.fast.ai/t/lesson-4-in-class-discussion/30318
https://course.fast.ai/videos/?lesson=5
Make sure you have the latest version of the code and the latest version of the course
$ conda update conda$ conda update anaconda$ conda activate fastai$ conda update --allCompare https://github.com/fastai/course-v3 to my local course.
cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1jupyter notebookhttps://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson5-sgd-mnist.ipynb
Last layer is removed e.g. resnet as don't have 1,000 objects. Also last layers is trained on very specific things. Freeze earlier layers as these recognize basic patterns.
Give different parts of the model different learning rates. Small learning rate for earlier objects. Discriminate learning rates. For fit can use 1e-3 - every layer gets the same lr. slice(1e-3) where final layer gets lr of 1e-3 and earlier layers get 1e-3 / 3. slice(1e-5, 1e-3) first layer gets 1e-5 and last layer gets 1e-3 then groups get lr in between those values.
Affine function (http://mathworld.wolfram.com/AffineFunction.html)
Embedding - look something up in an array. Fast and efficient way of multiplying with OHE.
Works e.g. movie has John Travolta and user likes John Travolta (latent factors/hidden relationships). However, if movie is really bad (with John Travolta) need to add in a bias.
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson5-sgd-mnist.ipynb
%matplotlib inlinefrom fastai.basics import *Get data from http://deeplearning.net/data/mnist/mnist.pkl.gz
path = Config().data_path()/'mnist'with gzip.open(path/'mnist.pkl.gz', 'rb') as f: ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')plt.imshow(x_train[0].reshape((28,28)), cmap="gray")x_train.shapex_train,y_train,x_valid,y_valid = map(torch.tensor, (x_train,y_train,x_valid,y_valid))n,c = x_train.shapex_train.shape, y_train.min(), y_train.max()bs=64train_ds = TensorDataset(x_train, y_train) # you can index into itvalid_ds = TensorDataset(x_valid, y_valid)data = DataBunch.create(train_ds, valid_ds, bs=bs)x,y = next(iter(data.train_dl))x.shape,y.shapeSub classing
class Mnist_Logistic(nn.Module): def __init__(self): super().__init__() # Copy nn.Module's input. self.lin = nn.Linear(784, 10, bias=True) # x@a + b # put into mini-batch def forward(self, xb): return self.lin(xb)model = Mnist_Logistic().cuda()modelmodel.linmodel(x).shape# shows input and output size[p.shape for p in model.parameters()]lr=2e-2# adds a softmaxloss_func = nn.CrossEntropyLoss()def update(x,y,lr): # weight decay value wd = 1e-5 y_hat = model(x) # weight decay w2 = 0. for p in model.parameters(): w2 += (p**2).sum() # add to regular loss loss = loss_func(y_hat, y) + w2*wd loss.backward() with torch.no_grad(): # Loop through the paramters for p in model.parameters(): p.sub_(lr * p.grad) p.grad.zero_() return loss.item() # values# for one mini-batchlosses = [update(x,y,lr) for x,y in data.train_dl]plt.plot(losses);class Mnist_NN(nn.Module): def __init__(self): super().__init__() self.lin1 = nn.Linear(784, 50, bias=True) self.lin2 = nn.Linear(50, 10, bias=True) def forward(self, xb): x = self.lin1(xb) x = F.relu(x) return self.lin2(x)model = Mnist_NN().cuda()losses = [update(x,y,lr) for x,y in data.train_dl]plt.plot(losses);model = Mnist_NN().cuda()def update(x,y,lr): opt = optim.Adam(model.parameters(), lr) # opt = optim.SGD(model.parameters(), lr, momentum=0.9) y_hat = model(x) loss = loss_func(y_hat, y) loss.backward() opt.step() opt.zero_grad() return loss.item()90% same direction as last time and 10% derivative (momentum).
s_t = alpha * g + (1 - alpha) * s_t-1.
RMSprop - gradient squared.
Adam - RMSprop and momentum
losses = [update(x,y,1e-3) for x,y in data.train_dl]plt.plot(losses);learn = Learner(data, Mnist_NN(), loss_func=loss_func, metrics=accuracy)learn.lr_find()learn.recorder.plot()learn.fit_one_cycle(1, 1e-2)# Learning rate per batchlearn.recorder.plot_lr(show_moms=True)learn.recorder.plot_losses()Cross entropy loss on two categories
Cat | Drop | Pred(Cat) | Pred(dog) | X-Entropy
1 | 0 | 0.5 | 0.5 | -1*log(0.5) -0*log(0.5).
Use softmax so they add up to 1.
https://forums.fast.ai/t/deep-learning-lesson-5-notes/31298
https://github.com/hiromis/notes/blob/master/Lesson5.md
https://github.com/fastai/course-v3/blob/master/files/xl/collab_filter.xlsx
https://docs.google.com/spreadsheets/d/1oxY9bxgLPutRidhTrucFeg5Il0Jq7UdMJgR3igTtbPU/edit#gid=1748360111 - google sheets version
https://forums.fast.ai/t/google-sheets-versions-of-spreadsheets/10424/7
https://forums.fast.ai/t/lesson-5-discussion-thread/30864
https://course.fast.ai/videos/?lesson=6
Make sure you have the latest version of the code and the latest version of the course
$ conda update conda$ conda update anaconda$ conda activate fastai$ conda update --allCompare https://github.com/fastai/course-v3 to my local course.
cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1jupyter notebookhttps://github.com/fastai/course-v3/blob/master/nbs/dl1/rossman_data_clean.ipynb
%reload_ext autoreload%autoreload 2from fastai.basics import *Download data from here http://files.fast.ai/part2/lesson14/rossmann.tgz
PATH=Config().data_path()/Path('rossmann/')table_names = ['train', 'store', 'store_states', 'state_names', 'googletrend', 'weather', 'test']tables = [pd.read_csv(PATH/f'{fname}.csv', low_memory=False) for fname in table_names]train, store, store_states, state_names, googletrend, weather, test = tableslen(train),len(test)turn state Holidays to booleans
train.StateHoliday = train.StateHoliday!='0'test.StateHoliday = test.StateHoliday!='0'def join_df(left, right, left_on, right_on=None, suffix='_y'): if right_on is None: right_on = left_on return left.merge(right, how='left', left_on=left_on, right_on=right_on, suffixes=("", suffix))Join weather/state names
weather = join_df(weather, state_names, "file", "StateName")Extracting dates and state names from the given data and adding those columns
googletrend['Date'] = googletrend.week.str.split(' - ', expand=True)[0]googletrend['State'] = googletrend.file.str.split('_', expand=True)[2]googletrend.loc[googletrend.State=='NI', "State"] = 'HB,NI'Extracts particular date fields from a complete datetime for the purpose of constructing categoricals
def add_datepart(df, fldname, drop=True, time=False): "Helper function that adds columns relevant to a date." fld = df[fldname] fld_dtype = fld.dtype if isinstance(fld_dtype, pd.core.dtypes.dtypes.DatetimeTZDtype): fld_dtype = np.datetime64 if not np.issubdtype(fld_dtype, np.datetime64): df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True) targ_pre = re.sub('[Dd]ate$', '', fldname) attr = ['Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear', 'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start'] if time: attr = attr + ['Hour', 'Minute', 'Second'] for n in attr: df[targ_pre + n] = getattr(fld.dt, n.lower()) df[targ_pre + 'Elapsed'] = fld.astype(np.int64) // 10 ** 9 if drop: df.drop(fldname, axis=1, inplace=True)add_datepart(weather, "Date", drop=False)add_datepart(googletrend, "Date", drop=False)add_datepart(train, "Date", drop=False)add_datepart(test, "Date", drop=False)Google trends data has a special category for the whole of the Germany - we'll pull that out so we can use it explicitly.
trend_de = googletrend[googletrend.file == 'Rossmann_DE']Outer join all of our data into a single dataframe and check for nulls to make sure it works
store = join_df(store, store_states, "Store")len(store[store.State.isnull()])joined = join_df(train, store, "Store")joined_test = join_df(test, store, "Store")len(joined[joined.StoreType.isnull()]),len(joined_test[joined_test.StoreType.isnull()])joined = join_df(joined, googletrend, ["State","Year", "Week"])joined_test = join_df(joined_test, googletrend, ["State","Year", "Week"])len(joined[joined.trend.isnull()]),len(joined_test[joined_test.trend.isnull()])joined = joined.merge(trend_de, 'left', ["Year", "Week"], suffixes=('', '_DE'))joined_test = joined_test.merge(trend_de, 'left', ["Year", "Week"], suffixes=('', '_DE'))len(joined[joined.trend_DE.isnull()]),len(joined_test[joined_test.trend_DE.isnull()])joined = join_df(joined, weather, ["State","Date"])joined_test = join_df(joined_test, weather, ["State","Date"])len(joined[joined.Mean_TemperatureC.isnull()]),len(joined_test[joined_test.Mean_TemperatureC.isnull()])for df in (joined, joined_test): for c in df.columns: if c.endswith('_y'): if c in df.columns: df.drop(c, inplace=True, axis=1)fill in missing values to avoid complications with NA's. Use random values
for df in (joined,joined_test): df['CompetitionOpenSinceYear'] = df.CompetitionOpenSinceYear.fillna(1900).astype(np.int32) df['CompetitionOpenSinceMonth'] = df.CompetitionOpenSinceMonth.fillna(1).astype(np.int32) df['Promo2SinceYear'] = df.Promo2SinceYear.fillna(1900).astype(np.int32) df['Promo2SinceWeek'] = df.Promo2SinceWeek.fillna(1).astype(np.int32)extract features "CompetitionOpenSince" and "CompetitionDaysOpen"
for df in (joined,joined_test): df["CompetitionOpenSince"] = pd.to_datetime(dict(year=df.CompetitionOpenSinceYear, month=df.CompetitionOpenSinceMonth, day=15)) df["CompetitionDaysOpen"] = df.Date.subtract(df.CompetitionOpenSince).dt.daysreplace some erroneous / outlying data
for df in (joined,joined_test): df.loc[df.CompetitionDaysOpen<0, "CompetitionDaysOpen"] = 0 df.loc[df.CompetitionOpenSinceYear<1990, "CompetitionDaysOpen"] = 0add "CompetitionMonthsOpen" field, limiting the maximum to 2 years to limit number of unique categories
for df in (joined,joined_test): df["CompetitionMonthsOpen"] = df["CompetitionDaysOpen"]//30 df.loc[df.CompetitionMonthsOpen>24, "CompetitionMonthsOpen"] = 24joined.CompetitionMonthsOpen.unique()! pip install isoweekfrom isoweek import Weekfor df in (joined,joined_test): df["Promo2Since"] = pd.to_datetime(df.apply(lambda x: Week( x.Promo2SinceYear, x.Promo2SinceWeek).monday(), axis=1)) df["Promo2Days"] = df.Date.subtract(df["Promo2Since"]).dt.daysfor df in (joined,joined_test): df.loc[df.Promo2Days<0, "Promo2Days"] = 0 df.loc[df.Promo2SinceYear<1990, "Promo2Days"] = 0 df["Promo2Weeks"] = df["Promo2Days"]//7 df.loc[df.Promo2Weeks<0, "Promo2Weeks"] = 0 df.loc[df.Promo2Weeks>25, "Promo2Weeks"] = 25 df.Promo2Weeks.unique()joined.to_pickle(PATH/'joined')joined_test.to_pickle(PATH/'joined_test')It is common when working with time series data to extract data that explains relationships across rows as opposed to columns, e.g.:
Define a function get_elapsed for cumulative counting across a sorted dataframe.
def get_elapsed(fld, pre): day1 = np.timedelta64(1, 'D') last_date = np.datetime64() last_store = 0 res = [] for s,v,d in zip(df.Store.values,df[fld].values, df.Date.values): if s != last_store: last_date = np.datetime64() last_store = s if v: last_date = d res.append(((d-last_date).astype('timedelta64[D]') / day1)) df[pre+fld] = res# Apply it tocolumns = ["Date", "Store", "Promo", "StateHoliday", "SchoolHoliday"]df = train[columns].append(test[columns])Say we're looking at School Holiday. We'll first sort by Store, then Date, and then call add_elapsed('SchoolHoliday', 'After'): This will apply to each row with School Holiday:
fld = 'SchoolHoliday'df = df.sort_values(['Store', 'Date'])get_elapsed(fld, 'After')df = df.sort_values(['Store', 'Date'], ascending=[True, False])get_elapsed(fld, 'Before')fld = 'StateHoliday'df = df.sort_values(['Store', 'Date'])get_elapsed(fld, 'After')df = df.sort_values(['Store', 'Date'], ascending=[True, False])get_elapsed(fld, 'Before')fld = 'Promo'df = df.sort_values(['Store', 'Date'])get_elapsed(fld, 'After')df = df.sort_values(['Store', 'Date'], ascending=[True, False])get_elapsed(fld, 'Before')Set the active index to Date
df = df.set_index("Date")Set null values from elapsed field calculations to 0.
columns = ['SchoolHoliday', 'StateHoliday', 'Promo']for o in ['Before', 'After']: for p in columns: a = o+p df[a] = df[a].fillna(0).astype(int)Demonstrate window functions in pandas to calculate rolling quantities
sort by date (sort_index()) and count the number of events of interest (sum()) defined in columns in the following week (rolling()), grouped by Store (groupby()). Do the same in the opposite direction.
bwd = df[['Store']+columns].sort_index().groupby("Store").rolling(7, min_periods=1).sum()fwd = df[['Store']+columns].sort_index(ascending=False).groupby("Store").rolling(7, min_periods=1).sum()Drop the Store indices grouped together in the window function
bwd.drop('Store',1,inplace=True)bwd.reset_index(inplace=True)fwd.drop('Store',1,inplace=True)fwd.reset_index(inplace=True)df.reset_index(inplace=True)Merge these values onto the df
df = df.merge(bwd, 'left', ['Date', 'Store'], suffixes=['', '_bw'])df = df.merge(fwd, 'left', ['Date', 'Store'], suffixes=['', '_fw'])df.drop(columns,1,inplace=True)Back up large tables of extracted / wrangled features before you join them onto another one
df.to_pickle(PATH/'df')df["Date"] = pd.to_datetime(df.Date)joined = pd.read_pickle(PATH/'joined')joined_test = pd.read_pickle(PATH/f'joined_test')joined = join_df(joined, df, ['Store', 'Date'])joined_test = join_df(joined_test, df, ['Store', 'Date'])removed all instances where the store had zero sale / was closed
joined = joined[joined.Sales!=0]joined.reset_index(inplace=True)joined_test.reset_index(inplace=True)joined.to_pickle(PATH/'train_clean')joined_test.to_pickle(PATH/'test_clean')https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson6-rossmann.ipynb
%reload_ext autoreload%autoreload 2from fastai.tabular import *The most useful part of the data clean is:
add_datepart(train, "Date", drop=False)Take the time piece and add a bunch of meta data e.g. year, month, day of week, month start or end, elapsed time since ... (Auto ML!). e.g. purchasing behavior may change on day of week.
path = Config().data_path()/'rossmann'train_df = pd.read_pickle(path/'train_clean')
train_df.head().Tn = len(train_df); nPre-processors - run once before you do any training (on training set). Shared with the validation dataset.
Create a small subset of the data:
idx = np.random.permutation(range(n))[:2000]idx.sort()small_train_df = train_df.iloc[idx[:1000]]small_test_df = train_df.iloc[idx[1000:]]small_cont_vars = ['CompetitionDistance', 'Mean_Humidity']small_cat_vars = ['Store', 'DayOfWeek', 'PromoInterval']small_train_df = small_train_df[small_cat_vars + small_cont_vars + ['Sales']]small_test_df = small_test_df[small_cat_vars + small_cont_vars + ['Sales']]small_train_df.head()First pre-processor is take the strings in PromoInterval and find all unique values, create a list and convert them into numbers.
categorify = Categorify(small_cat_vars, small_cont_vars)categorify(small_train_df)categorify(small_test_df, test=True)small_test_df.head()see categories
small_train_df.PromoInterval.cat.categoriessee codes
small_train_df['PromoInterval'].cat.codes[:5]Another pre-processor is to fill missing values. Add's a columns called _na (boolean) and adds a medium value.
fill_missing = FillMissing(small_cat_vars, small_cont_vars)fill_missing(small_train_df)fill_missing(small_test_df, test=True)Read in full dataset
train_df = pd.read_pickle(path/'train_clean')test_df = pd.read_pickle(path/'test_clean')Specify pre-processors
procs=[FillMissing, Categorify, Normalize]cat_vars = ['Store', 'DayOfWeek', 'Year', 'Month', 'Day', 'StateHoliday', 'CompetitionMonthsOpen', 'Promo2Weeks', 'StoreType', 'Assortment', 'PromoInterval', 'CompetitionOpenSinceYear', 'Promo2SinceYear', 'State', 'Week', 'Events', 'Promo_fw', 'Promo_bw', 'StateHoliday_fw', 'StateHoliday_bw', 'SchoolHoliday_fw', 'SchoolHoliday_bw']cont_vars = ['CompetitionDistance', 'Max_TemperatureC', 'Mean_TemperatureC', 'Min_TemperatureC', 'Max_Humidity', 'Mean_Humidity', 'Min_Humidity', 'Max_Wind_SpeedKm_h', 'Mean_Wind_SpeedKm_h', 'CloudCover', 'trend', 'trend_DE', 'AfterStateHoliday', 'BeforeStateHoliday', 'Promo', 'SchoolHoliday']dep_var = 'Sales'df = train_df[cat_vars + cont_vars + [dep_var,'Date']].copy()test_df['Date'].min(), test_df['Date'].max()Use date to create a validation dataset. Same length at test set
cut = train_df['Date'][(train_df['Date'] == train_df['Date'][len(test_df)])].index.max()valid_idx = range(cut)df[dep_var].head()the dep variance is an int. Fastai will think it's a classification problem. Need to specific it's a regression by doing label_cls is a list of floats with log=True. Take the log of the dependent variable. Because eval metric is RMSPE take the log of y which makes it RMSE.
data = (TabularList.from_df(df, path=path, cat_names=cat_vars, cont_names=cont_vars, procs=procs,) .split_by_idx(valid_idx) .label_from_df(cols=dep_var, label_cls=FloatList, log=True) .add_test(TabularList.from_df(test_df, path=path, cat_names=cat_vars, cont_names=cont_vars)) .databunch())doc(FloatList)Model
Pass in y_range which gives a sigmoid between 0 and an upper limit of the dependent variables.
max_log_y = np.log(np.max(train_df['Sales'])*1.2)y_range = torch.tensor([0, max_log_y], device=defaults.device)Pass in architecture. NN is 1000 * 500 paramters. This will overfit a data with a few hundred thousand rows. p's (probabilities) provide dropout
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
Emb_drop also provide dropout. Embedding is matmul of OHE.
learn = tabular_learner(data, layers=[1000,500], ps=[0.001,0.01], emb_drop=0.04, y_range=y_range, metrics=exp_rmspe)learn.modelFirst emb layer is number of stores (first cat variable). Second number is size of the embedding. Then batch norm of size 16 (16 input variables).
batch normalization: accelerating deep network training by reducing internal covariate shift. Loss function is less bumper so you can increase your LR.
y^ = f(w1,...wn, x)*g + b.
g + b are parameters for batch norm that help scale the output to expected range (mean and std).
len(data.train_ds.cont_names)learn.lr_find()learn.recorder.plot()learn.fit_one_cycle(5, 1e-3, wd=0.2)learn.save('1')learn.recorder.plot_losses(last=-1)learn.load('1');learn.fit_one_cycle(5, 3e-4)learn.fit_one_cycle(5, 3e-4)test_preds=learn.get_preds(DatasetType.Test)test_df["Sales"]=np.exp(test_preds[0].data).numpy().T[0]test_df[["Id","Sales"]]=test_df[["Id","Sales"]].astype("int")test_df[["Id","Sales"]].to_csv("rossmann_submission.csv",index=False)https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson6-pets-more.ipynb
%reload_ext autoreload%autoreload 2%matplotlib inlinefrom fastai.vision import *bs = 64path = untar_data(URLs.PETS)/'images'Ratchet up the defaults. What's the probability of an affine transform? What's the probability of a light transform?
https://docs.fast.ai/vision.transform.html#get_transforms
Look at validation dataset and see what the lighting looks like.
e.g. satellite data use rotated images.
Use flipped images.
Symmetric warp
tfms = get_transforms(max_rotate=20, max_zoom=1.3, max_lighting=0.4, max_warp=0.4, p_affine=1., p_lighting=1.)src = ImageList.from_folder(path).split_by_rand_pct(0.2, seed=2)def get_data(size, bs, padding_mode='reflection'): return (src.label_from_re(r'([^/]+)_\d+.jpg$') .transform(tfms, size=size, padding_mode=padding_mode) .databunch(bs=bs).normalize(imagenet_stats))data = get_data(224, bs, 'zeros')def _plot(i,j,ax): x,y = data.train_ds[3] x.show(ax, y=y)plot_multi(_plot, 3, 3, figsize=(8,8))data = get_data(224,bs)plot_multi(_plot, 3, 3, figsize=(8,8))Train a model
gc.collect()learn = cnn_learner(data, models.resnet34, metrics=error_rate, bn_final=True)learn.fit_one_cycle(3, slice(1e-2), pct_start=0.8)learn.unfreeze()learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-3), pct_start=0.8)data = get_data(352,bs)learn.data = datalearn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4))learn.save('352')Convolutional kernel
data = get_data(352,16)learn = cnn_learner(data, models.resnet34, metrics=error_rate, bn_final=True).load('352')idx=0x,y = data.valid_ds[idx]x.show()data.valid_ds.y[idx]k = tensor([ [0. , -5/3, 1], [-5/3, -5/3, 1], [1., 1 ,1],]).expand(1, 3, 3, 3) / 6kk.shapet = data.valid_ds[0][0].data; t.shapet[None].shapeedge = F.conv2d(t[None], k)show_image(edge[0], figsize=(5,5));data.clearn.modelhttps://www.fast.ai/2018/07/02/adam-weight-decay/
print(learn.summary())heatmap
m = learn.model.eval();m[0] is convolutional part
Create a mini-bath with 1 thing in it
xb,_ = data.one_item(x)xb_im = Image(data.denorm(xb)[0])xb = xb.cuda()from fastai.callbacks.hooks import *a hook allows you to hook into the fastai/python library and run python e.g. return the convolutional part or a certain layer. Hook the output of m[0]
def hooked_backward(cat=y): with hook_output(m[0]) as hook_a: with hook_output(m[0], grad=True) as hook_g: preds = m(xb) preds[0,int(cat)].backward() return hook_a,hook_ghook_a,hook_g = hooked_backward()acts = hook_a.stored[0].cpu()acts.shapeTake mean of channel axis
avg_acts = acts.mean(0)avg_acts.shapedef show_heatmap(hm): _,ax = plt.subplots() xb_im.show(ax) ax.imshow(hm, alpha=0.6, extent=(0,352,352,0), interpolation='bilinear', cmap='magma');show_heatmap(avg_acts)Grad-CAM
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
grad = hook_g.stored[0][0].cpu()grad_chan = grad.mean(1).mean(1)grad.shape,grad_chan.shapemult = (acts*grad_chan[...,None,None]).mean(0)show_heatmap(mult)Generative models - create new text, new image, new video, new sound.
Artificial Intelligence needs all of us | Rachel Thomas P.h.D. | TEDxSanFrancisco
Some Healthy Principles About Ethics & Bias In AI | Rachel Thomas @ PyBay2018
accuracy on lighter male vs darker skim female - http://gendershades.org/
https://www.crunchbase.com/organization/deep-glint#section-overview - Facial AI for surveillance
Text translation e.g. English -> Turkey -> English 'He is a doctor. She is a nurse'.
Compass - for law to suggest jail vs. bail.
Why?
Get humans back in the loop.
Talk to domain experts and those impacted - https://fatconference.org/
Evan Estola - When Recommendations Systems Go Bad - MLconf SEA 2016
Datasheets for Datasets - better documentation regarding datasets.
https://github.com/hiromis/notes/blob/master/Lesson6.md
https://forums.fast.ai/t/lesson-6-in-class-discussion/31440
https://forums.fast.ai/t/lesson-6-advanced-discussion/31442
https://platform.ai/ - comp vision start-up. - Upload pics and use it to help labels your pics based on a deep learning model (e.g. choose a layer or a choose a projection).
https://forums.fast.ai/t/platform-ai-discussion/31445
50 Years of Test (Un)fairness: Lessons for Machine Learning paper - https://128.84.21.199/pdf/1811.10104.pdf
Cornell conv course - http://www.cs.cornell.edu/courses/cs1114/2013sp/sections/S06_convolution.pdf
conv arithmetic - https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
https://arthurdouillard.com/post/normalization/ e.g. images
cross entropy loss - https://gombru.github.io/2018/05/23/cross_entropy_loss/
https://brohrer.github.io/how_convolutional_neural_networks_work.html
https://openframeworks.cc/ofBook/chapters/image_processing_computer_vision.html
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
https://knowingneurons.com/2014/10/29/hubel-and-wiesel-the-neural-basis-of-visual-perception/
https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Perception
https://course.fast.ai/videos/?lesson=7
Make sure you have the latest version of the code and the latest version of the course
$ conda update conda$ conda update anaconda$ conda activate fastai$ conda update --allCompare https://github.com/fastai/course-v3 to my local course.
cd /home/ray/Documents/COURSES/fastai/course-v3/nbs/dl1jupyter notebookhttps://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-resnet-mnist.ipynb
%reload_ext autoreload%autoreload 2%matplotlib inlinefrom fastai.vision import *path = untar_data(URLs.MNIST)path.ls()il = ImageList.from_folder(path, convert_mode='L') # Convert to grey scaleil.items[0]defaults.cmap='binary'ilil[0].show()# Has labels therefore is valid not testsd = il.split_by_folder(train='training', valid='testing')sd(path/'training').ls()ll = sd.label_from_folder() # label listllx,y = ll.train[0]x.show()print(y,x.shape)# Transformstfms = ([*rand_pad(padding=3, size=28, mode='zeros')], [])ll = ll.transform(tfms)bs = 128# not using imagenet_stats because not using pretrained modeldata = ll.databunch(bs=bs).normalize()x,y = data.train_ds[0]x.show()print(y)def _plot(i,j,ax): data.train_ds[0][0].show(ax, cmap='gray')plot_multi(_plot, 3, 3, figsize=(8,8))xb,yb = data.one_batch()xb.shape,yb.shapedata.show_batch(rows=3, figsize=(5,5))Basic CNN with batchnorm
def conv(ni,nf): return nn.Conv2d(ni, nf, kernel_size=3, stride=2, padding=1)model = nn.Sequential( conv(1, 8), # 14 nn.BatchNorm2d(8), nn.ReLU(), conv(8, 16), # 7 nn.BatchNorm2d(16), nn.ReLU(), conv(16, 32), # 4 nn.BatchNorm2d(32), nn.ReLU(), conv(32, 16), # 2 nn.BatchNorm2d(16), nn.ReLU(), conv(16, 10), # 1 nn.BatchNorm2d(10), Flatten() # remove (1,1) grid)learn = Learner(data, model, loss_func = nn.CrossEntropyLoss(), metrics=accuracy)print(learn.summary())model(xb).shapelearn.lr_find(end_lr=100)learn.recorder.plot()learn.fit_one_cycle(3, max_lr=0.1)Refactor
def conv2(ni,nf): return conv_layer(ni,nf,stride=2)model = nn.Sequential( conv2(1, 8), # 14 conv2(8, 16), # 7 conv2(16, 32), # 4 conv2(32, 16), # 2 conv2(16, 10), # 1 Flatten() # remove (1,1) grid)learn = Learner(data, model, loss_func = nn.CrossEntropyLoss(), metrics=accuracy)learn.fit_one_cycle(10, max_lr=0.1)Resnet-ish
x -> Two layers (f(x)) -> f(x) + x. Identity/skipped connection.
class ResBlock(nn.Module): def __init__(self, nf): super().__init__() self.conv1 = conv_layer(nf,nf) self.conv2 = conv_layer(nf,nf) def forward(self, x): return x + self.conv2(self.conv1(x))help(res_block)model = nn.Sequential( conv2(1, 8), res_block(8), conv2(8, 16), res_block(16), conv2(16, 32), res_block(32), conv2(32, 16), res_block(16), conv2(16, 10), Flatten())def conv_and_res(ni,nf): return nn.Sequential(conv2(ni, nf), res_block(nf))model = nn.Sequential( conv_and_res(1, 8), conv_and_res(8, 16), conv_and_res(16, 32), conv_and_res(32, 16), conv2(16, 10), Flatten())learn = Learner(data, model, loss_func = nn.CrossEntropyLoss(), metrics=accuracy)learn.lr_find(end_lr=100)learn.recorder.plot()learn.fit_one_cycle(12, max_lr=0.05)print(learn.summary())A guide to convolution arithmetic for deep learning
Could scale image up and use NN interp.
U-Net: Convolutional Networks for Biomedical Image Segmentation
Have to end up with something same size as image. Add padding outside input and in between things. Use skipped connections with the down part of u-net.
Image restoration.
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-superres-gan.ipynb
import fastaifrom fastai.vision import *from fastai.callbacks import *from fastai.vision.gan import *path = untar_data(URLs.PETS)path_hr = path/'images'path_lr = path/'crappy'Crappify
Resize to be small, pick a random number, draw it on image. e.g. if you want to color black and white image make it black and white.
from fastai.vision import *from PIL import Image, ImageDraw, ImageFontclass crappifier(object): def __init__(self, path_lr, path_hr): self.path_lr = path_lr self.path_hr = path_hr def __call__(self, fn, i): dest = self.path_lr/fn.relative_to(self.path_hr) dest.parent.mkdir(parents=True, exist_ok=True) img = PIL.Image.open(fn) targ_sz = resize_to(img, 96, use_min=True) img = img.resize(targ_sz, resample=PIL.Image.BILINEAR).convert('RGB') w,h = img.size q = random.randint(10,70) ImageDraw.Draw(img).text((random.randint(0,w//2),random.randint(0,h//2)), str(q), fill=(255,255,255)) img.save(dest, quality=q)from crappify import *il = ImageList.from_folder(path_hr)parallel(crappifier(path_lr, path_hr), il.items)bs,size=32, 128# bs,size = 24,160#bs,size = 8,256Pre-train generator
arch = models.resnet34src = ImageImageList.from_folder(path_lr).split_by_rand_pct(0.1, seed=42)def get_data(bs,size): data = (src.label_from_func(lambda x: path_hr/x.name) .transform(get_transforms(max_zoom=2.), size=size, tfm_y=True) .databunch(bs=bs).normalize(imagenet_stats, do_y=True)) data.c = 3 return datadata_gen = get_data(bs,size)data_gen.show_batch(4)Make a U-net. Use a model with pre-trained wegiths
wd = 1e-3y_range = (-3.,3.)loss_gen = MSELossFlat() # flattens out imagesdef create_gen_learner(): return unet_learner(data_gen, arch, wd=wd, blur=True, norm_type=NormType.Weight, self_attention=True, y_range=y_range, loss_func=loss_gen)learn_gen = create_gen_learner()learn_gen.fit_one_cycle(2, pct_start=0.8)learn_gen.unfreeze() # Un-freeze model (res-net) down sample partlearn_gen.fit_one_cycle(3, slice(1e-6,1e-3))learn_gen.show_results(rows=4)learn_gen.save('gen-pre2')Model works but leaves some artifacts. GAN -> Loss is discriminator/critic. Fine tune the generator.
Create the critic. Save the generated images.
learn_gen.load('gen-pre2');name_gen = 'image_gen'path_gen = path/name_genpath_gen.mkdir(exist_ok=True)def save_preds(dl): i=0 names = dl.dataset.items for b in dl: preds = learn_gen.pred_batch(batch=b, reconstruct=True) for o in preds: o.save(path_gen/names[i].name) i += 1save_preds(data_gen.fix_dl)PIL.Image.open(path_gen.ls()[0])Train critic
learn_gen=None # Clear up GPUgc.collect()Pretrain the critic on crappy vs not crappy.
def get_crit_data(classes, bs, size): src = ImageList.from_folder(path, include=classes).split_by_rand_pct(0.1, seed=42) ll = src.label_from_folder(classes=classes) data = (ll.transform(get_transforms(max_zoom=2.), size=size) .databunch(bs=bs).normalize(imagenet_stats)) data.c = 3 return datadata_crit = get_crit_data([name_gen, 'images'], bs=bs, size=size)data_crit.show_batch(rows=3, ds_type=DatasetType.Train, imgsize=3)loss_critic = AdaptiveLoss(nn.BCEWithLogitsLoss()) # Binary cross-entropydef create_critic_learner(data, metrics): return Learner(data, gan_critic(), metrics=metrics, loss_func=loss_critic, wd=wd)learn_critic = create_critic_learner(data_crit, accuracy_thresh_expand)learn_critic.fit_one_cycle(6, 1e-3)learn_critic.save('critic-pre2')GAN
combine those pretrained model in a GAN
learn_crit=Nonelearn_gen=Nonegc.collect()data_crit = get_crit_data(['crappy', 'images'], bs=bs, size=size)learn_crit = create_critic_learner(data_crit, metrics=None).load('critic-pre2')learn_gen = create_gen_learner().load('gen-pre2')switcher = partial(AdaptiveGANSwitcher, critic_thresh=0.65)learn = GANLearner.from_learners(learn_gen, learn_crit, weights_gen=(1.,50.), show_img=False, switcher=switcher, opt_func=partial(optim.Adam, betas=(0.,0.99)), wd=wd)learn.callback_fns.append(partial(GANDiscriminativeLR, mult_lr=5.))lr = 1e-4learn.fit(40,lr)learn.save('gan-1c')learn.data=get_data(16,192)learn.fit(10,lr/2)learn.show_results(rows=16)learn.save('gan-1c')https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-wgan.ipynb
%reload_ext autoreload%autoreload 2%matplotlib inlinefrom fastai.vision import *from fastai.vision.gan import *LSun bedroom data https://github.com/fyu/lsun
path = untar_data(URLs.LSUN_BEDROOMS)Random noise of size 100 by default as inputs and the images of bedrooms as targets. tfm_y=True in the transforms, then apply the normalization to the ys
def get_data(bs, size): return (GANItemList.from_folder(path, noise_sz=100) .split_none() .label_from_func(noop) .transform(tfms=[[crop_pad(size=size, row_pct=(0,1), col_pct=(0,1))], []], size=size, tfm_y=True) .databunch(bs=bs) .normalize(stats = [torch.tensor([0.5,0.5,0.5]), torch.tensor([0.5,0.5,0.5])], do_x=False, do_y=True))begin with a small side and use gradual resizing
data = get_data(128, 64)data.show_batch(rows=5)Generative Adversarial Nets - https://arxiv.org/pdf/1406.2661.pdf
Train two models at the same time: a generator and a critic. The generator will try to make new images similar to the ones in our dataset, and the critic will try to classify real images from the ones the generator does. The generator returns images, the critic a single number (usually 0. for fake images and 1. for real ones).
We train them against each other in the sense that at each step (more or less), we:
real)fake)Wasserstein GAN - https://arxiv.org/pdf/1701.07875.pdf
Create a generator and a critic that we pass to gan_learner. The noise_size is the size of the random vector from which our generator creates images.
generator = basic_generator(in_size=64, n_channels=3, n_extra_layers=1)critic = basic_critic (in_size=64, n_channels=3, n_extra_layers=1)learn = GANLearner.wgan(data, generator, critic, switch_eval=False, opt_func = partial(optim.Adam, betas = (0.,0.99)), wd=0.)learn.fit(30,2e-4)learn.gan_trainer.switch(gen_mode=True)learn.show_results(ds_type=DatasetType.Train, rows=16, figsize=(8,8))Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Downsample encoder and upsample decoder.
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-superres.ipynb
import fastaifrom fastai.vision import *from fastai.callbacks import *from fastai.utils.mem import *from torchvision.models import vgg16_bnpath = untar_data(URLs.PETS)path_hr = path/'images'path_lr = path/'small-96'path_mr = path/'small-256'il = ImageList.from_folder(path_hr)Crappify
def resize_one(fn, i, path, size): dest = path/fn.relative_to(path_hr) dest.parent.mkdir(parents=True, exist_ok=True) img = PIL.Image.open(fn) targ_sz = resize_to(img, size, use_min=True) img = img.resize(targ_sz, resample=PIL.Image.BILINEAR).convert('RGB') img.save(dest, quality=60)# create smaller image sets the first time this nb is runsets = [(path_lr, 96), (path_mr, 256)]for p,size in sets: if not p.exists(): print(f"resizing to {size} into {p}") parallel(partial(resize_one, path=p, size=size), il.items)bs,size=32,128arch = models.resnet34src = ImageImageList.from_folder(path_lr).split_by_rand_pct(0.1, seed=42)def get_data(bs,size): data = (src.label_from_func(lambda x: path_hr/x.name) .transform(get_transforms(max_zoom=2.), size=size, tfm_y=True) .databunch(bs=bs).normalize(imagenet_stats, do_y=True)) data.c = 3 return datadata = get_data(bs,size)data.show_batch(ds_type=DatasetType.Valid, rows=2, figsize=(9,9))Feature loss
t = data.valid_ds[0][1].datat = torch.stack([t,t])def gram_matrix(x): n,c,h,w = x.size() x = x.view(n, c, -1) return (x @ x.transpose(1,2))/(c*h*w)gram_matrix(t)MAE loss
base_loss = F.l1_lossFeatures has the convolutional part. Eval mode as not training. Turn off requires_grad as not updating weights.
vgg_m = vgg16_bn(True).features.cuda().eval()requires_grad(vgg_m, False)Find just before the max pool layers (relu).
blocks = [i-1 for i,o in enumerate(children(vgg_m)) if isinstance(o,nn.MaxPool2d)]blocks, [vgg_m[i] for i in blocks]class FeatureLoss(nn.Module): def __init__(self, m_feat, layer_ids, layer_wgts): super().__init__() self.m_feat = m_feat self.loss_features = [self.m_feat[i] for i in layer_ids] self.hooks = hook_outputs(self.loss_features, detach=False) self.wgts = layer_wgts self.metric_names = ['pixel',] + [f'feat_{i}' for i in range(len(layer_ids)) ] + [f'gram_{i}' for i in range(len(layer_ids))] def make_features(self, x, clone=False): self.m_feat(x) return [(o.clone() if clone else o) for o in self.hooks.stored] def forward(self, input, target): out_feat = self.make_features(target, clone=True) in_feat = self.make_features(input) self.feat_losses = [base_loss(input,target)] self.feat_losses += [base_loss(f_in, f_out)*w for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)] self.feat_losses += [base_loss(gram_matrix(f_in), gram_matrix(f_out))*w**2 * 5e3 for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)] self.metrics = dict(zip(self.metric_names, self.feat_losses)) return sum(self.feat_losses) def __del__(self): self.hooks.remove()feat_loss = FeatureLoss(vgg_m, blocks[2:5], [5,15,2])Train
wd = 1e-3learn = unet_learner(data, arch, wd=wd, loss_func=feat_loss, callback_fns=LossMetrics, blur=True, norm_type=NormType.Weight)gc.collect();learn.lr_find()learn.recorder.plot()lr = 1e-3def do_fit(save_name, lrs=slice(lr), pct_start=0.9): learn.fit_one_cycle(10, lrs, pct_start=pct_start) learn.save(save_name) learn.show_results(rows=1, imgsize=5)do_fit('1a', slice(lr*10)) # Quicker than a GANlearn.unfreeze()do_fit('1b', slice(1e-5,lr))data = get_data(12,size*2)learn.data = datalearn.freeze()gc.collect()learn.load('1b');do_fit('2a')learn.unfreeze()do_fit('2b', slice(1e-6,1e-4), pct_start=0.3)Test
learn = Nonegc.collect();256/320*1024256/320*1600free = gpu_mem_get_free_no_cache()# the max size of the test image depends on the available GPU RAM if free > 8000: size=(1280, 1600) # > 8GB RAMelse: size=( 820, 1024) # <= 8GB RAMprint(f"using size={size}, have {free}MB of GPU RAM free")learn = unet_learner(data, arch, loss_func=F.l1_loss, blur=True, norm_type=NormType.Weight)data_mr = (ImageImageList.from_folder(path_mr).split_by_rand_pct(0.1, seed=42) .label_from_func(lambda x: path_hr/x.name) .transform(get_transforms(), size=size, tfm_y=True) .databunch(bs=1).normalize(imagenet_stats, do_y=True))data_mr.c = 3learn.load('2b');learn.data = data_mrfn = data_mr.valid_ds.x.items[0]; fnimg = open_image(fn); img.shapep,img_hr,b = learn.predict(img)show_image(img, figsize=(18,15), interpolation='nearest');Image(img_hr).show(figsize=(18,15))https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-human-numbers.ipynb
from fastai.text import *bs=64path = untar_data(URLs.HUMAN_NUMBERS)path.ls()def readnums(d): return [', '.join(o.strip() for o in open(path/d).readlines())]train_txt = readnums('train.txt'); train_txt[0][:80]valid_txt = readnums('valid.txt'); valid_txt[0][-80:]train = TextList(train_txt, path=path)valid = TextList(valid_txt, path=path)src = ItemLists(path=path, train=train, valid=valid).label_for_lm()data = src.databunch(bs=bs)train[0].text[:80] # one document so 0# retuns xxbos. xx is unknown token. bos is beginning of string.len(data.valid_ds[0][0].data)data.bptt, len(data.valid_dl) # bptt is back prop through timeshttps://github.com/fastai/fastai/blob/f93a5f028e2cf73448dda188682d437c610424c3/fastai/text/learner.py#L24864 batches split into 70. 3 batches
13017/70/bsit = iter(data.valid_dl)x1,y1 = next(it)x2,y2 = next(it)x3,y3 = next(it)it.close()x1.numel()+x2.numel()+x3.numel()x1.shape,y1.shapex2.shape,y2.shapex1[:,0]y1[:,0]Grab a vocab. Every mini-batch joins up with the next mini-batch.
v = data.valid_ds.vocabv.textify(x1[0])v.textify(y1[0])v.textify(x2[0])v.textify(x3[0])v.textify(x1[1])v.textify(x2[1])v.textify(x3[1])v.textify(x3[-1])data.show_batch(ds_type=DatasetType.Valid)Single fully connected model
data = src.databunch(bs=bs, bptt=3)x,y = data.one_batch()x.shape,y.shapenv = len(v.itos); nvnh=64def loss4(input,target): return F.cross_entropy(input, target[:,-1])def acc4 (input,target): return accuracy(input, target[:,-1])class Model0(nn.Module): def __init__(self): super().__init__() self.i_h = nn.Embedding(nv,nh) # green arrow self.h_h = nn.Linear(nh,nh) # brown arrow self.h_o = nn.Linear(nh,nv) # blue arrow self.bn = nn.BatchNorm1d(nh) def forward(self, x): h = self.bn(F.relu(self.h_h(self.i_h(x[:,0])))) if x.shape[1]>1: h = h + self.i_h(x[:,1]) h = self.bn(F.relu(self.h_h(h))) if x.shape[1]>2: h = h + self.i_h(x[:,2]) h = self.bn(F.relu(self.h_h(h))) return self.h_o(h)learn = Learner(data, Model0(), loss_func=loss4, metrics=acc4)learn.fit_one_cycle(6, 1e-4)Same thing with a loop
class Model1(nn.Module): def __init__(self): super().__init__() self.i_h = nn.Embedding(nv,nh) # green arrow self.h_h = nn.Linear(nh,nh) # brown arrow self.h_o = nn.Linear(nh,nv) # blue arrow self.bn = nn.BatchNorm1d(nh) def forward(self, x): h = torch.zeros(x.shape[0], nh).to(device=x.device) for i in range(x.shape[1]): h = h + self.i_h(x[:,i]) h = self.bn(F.relu(self.h_h(h))) return self.h_o(h)learn = Learner(data, Model1(), loss_func=loss4, metrics=acc4)learn.fit_one_cycle(6, 1e-4)Multi-fully connected model
Use bptt as 20 (use 20 words to predict 21st?). Predict every word. e.g. array.
data = src.databunch(bs=bs, bptt=20)x,y = data.one_batch()x.shape,y.shapeclass Model2(nn.Module): def __init__(self): super().__init__() self.i_h = nn.Embedding(nv,nh) self.h_h = nn.Linear(nh,nh) self.h_o = nn.Linear(nh,nv) self.bn = nn.BatchNorm1d(nh) def forward(self, x): h = torch.zeros(x.shape[0], nh).to(device=x.device) res = [] for i in range(x.shape[1]): h = h + self.i_h(x[:,i]) h = F.relu(self.h_h(h)) res.append(self.h_o(self.bn(h))) return torch.stack(res, dim=1)learn = Learner(data, Model2(), metrics=accuracy)Maintain state
class Model3(nn.Module): def __init__(self): super().__init__() self.i_h = nn.Embedding(nv,nh) self.h_h = nn.Linear(nh,nh) self.h_o = nn.Linear(nh,nv) self.bn = nn.BatchNorm1d(nh) self.h = torch.zeros(bs, nh).cuda() def forward(self, x): res = [] h = self.h for i in range(x.shape[1]): h = h + self.i_h(x[:,i]) h = F.relu(self.h_h(h)) res.append(self.bn(h)) self.h = h.detach() res = torch.stack(res, dim=1) res = self.h_o(res) return reslearn = Learner(data, Model3(), metrics=accuracy)learn.fit_one_cycle(20, 3e-3)Stack RNN's
class Model4(nn.Module): def __init__(self): super().__init__() self.i_h = nn.Embedding(nv,nh) self.rnn = nn.RNN(nh,nh, batch_first=True) self.h_o = nn.Linear(nh,nv) self.bn = BatchNorm1dFlat(nh) self.h = torch.zeros(1, bs, nh).cuda() def forward(self, x): res,h = self.rnn(self.i_h(x), self.h) self.h = h.detach() return self.h_o(self.bn(res))learn = Learner(data, Model4(), metrics=accuracy)learn.fit_one_cycle(20, 3e-3)GRU/LSTM
Way to do some kind of drop out.
class Model5(nn.Module): def __init__(self): super().__init__() self.i_h = nn.Embedding(nv,nh) self.rnn = nn.GRU(nh, nh, 2, batch_first=True) self.h_o = nn.Linear(nh,nv) self.bn = BatchNorm1dFlat(nh) self.h = torch.zeros(2, bs, nh).cuda() def forward(self, x): res,h = self.rnn(self.i_h(x), self.h) self.h = h.detach() return self.h_o(self.bn(res))learn = Learner(data, Model5(), metrics=accuracy)learn.fit_one_cycle(10, 1e-2)Can also use for sequence labeling.
Document and test code
https://forums.fast.ai/t/dev-projects-index/29849
Visualizing the Loss Landscape of Neural Nets
https://github.com/vdumoulin/conv_arithmetic
Perceptual Losses for Real-Time Style Transfer and Super-Resolution