PyTorch Training CNN on MNIST Dataset

PyTorch is a middle ground between Keras and Tensorflow—it offers some high-level commands which let you easily construct basic neural network structures. At the same time, it lets you work directly with tensors and perform advanced customization of neural network architecture and hyperparameters.

To create a CNN model in PyTorch, you use the nn.Module class which contains a complete neural network toolkit, including convolutional, pooling and fully connected layers for your CNN model. PyTorch lets you define parameters at every stage—dataset loading, CNN layer construction, training, forward pass, backpropagation, and model testing.


PyTorch CNN Commands Cheat Sheet

Your First Convolutional Neural Network in PyTorch


This brief tutorial shows how to load the MNIST dataset into PyTorch, train and run a CNN model on it. As mentioned above, MNIST is a standard deep learning dataset containing 70,000 handwritten digits from 0-9. Our discussion is based on the great tutorial by Andy Thomas.

Follow these steps to train CNN on MNIST and generate predictions:

1. Set hyperparameters—these are safe to start with.

num_epochs = 5

num_classes = 10

batch_size = 100

learning_rate = 0.001DATA_PATH = 'C:\\...\PycharmProjects\MNISTData'MODEL_STORE_PATH = 'C:\\...\PycharmProjects\pytorch_models\\'


2. Specify local drive folders to store the MNIST dataset, and a location for the trained data.

MODEL_STORE_PATH = 'C:\\...\PycharmProjects\pytorch_models\\'

3. The MNIST dataset comes built into PyTorch, accessible via torchvision.datasets.MNIST. Transform the dataset using the transforms.Compose() function, as follows:

  • Convert the input data set to a PyTorch tensor.

  • Normalize the data, supplying the mean (0.1307) and standard deviation (0.3081) of the MNIST dataset. You need to do this for every channel in the dataset, but because MNIST is grayscale, there is only one channel and one mean/STD pair.


trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])

4. Create the train_dataset and test_dataset objects, providing the following arguments:

  • root—specifies the folder where the train.pt and test.pt data files exist

  • train—specifies whether to use the train.pt or test.pt data file

  • transform—passes the transform object, created earlier

  • download—specifies that the MNIST data should be downloaded from source if needed

train_dataset = torchvision.datasets.MNIST(root=DATA_PATH, train=True, transform=trans, download=True)

test_dataset = torchvision.datasets.MNIST(root=DATA_PATH, train=False, transform=trans)

5. Load the train and test datasets into the data loader. A data loader can be used as an iterator – to extract the data, just use a standard Python iterator such as enumerate. The DataLoader function takes three arguments:

  • The data set you wish to load

  • The batch size

  • Whether you wish to randomly shuffle the data

train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)

test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

6. Create the CNN model by initializing the nn.Module class. This is a PyTorch class which has everything you need to build a neural network. It also provides recursive operations, ways of parallelizing work and moving it to a GPU or back to a CPU, and more. We’ll create the following neural layers:

  • layer1—using the nn.Sequential object, we create a compound layer that includes a 2D convolutional layer, a ReLu activation function and a 2D MaxPool layer.

    • The Conv2d method lets you define, in this order: number of input channels (1 because MNIST images are grayscale), number of output channels, the size of the convolutional filter (you can supply a tuple for different shapes), stride and padding.

    • Stride and padding for the convolutional layer are defined using this equation: Wout=(Win–F+2P)S+1 where Wout=width of output, Win = width of input, F=filter, S=stride, P=padding. The same formula can be used for height, since the images are symmetric. Because we want the output size to be the same as the input (2), we set stride to 1 and padding to 2 → the output width and height = 2.

    • For the pooling layer, we set stride to 2 and padding to zero, to down-sample and reduce images by a factor of 2.

    • Thus, the output from layer1 will be 32 channels of 14 x 14 pixel images.

  • layer2—like 1, except input channels are 32 because it received the output of the first layer, and output 64 channels.

  • drop_out layer to avoid over-fitting in the model.

  • fc1 and fc2—Two fully connected layers, created using the nn.Linear method. The first with 7 x 7 x 64 nodes, and the second with 1000 nodes. The second argument of nn.Linear specifies the number of nodes in the next layer.

class ConvNet(nn.Module):

def __init__(self):

super(ConvNet, self).__init__()

self.layer1 = nn.Sequential(

nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),

nn.ReLU(),

nn.MaxPool2d(kernel_size=2, stride=2))

self.layer2 = nn.Sequential(

nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),

nn.ReLU(),

nn.MaxPool2d(kernel_size=2, stride=2))

self.drop_out = nn.Dropout()

self.fc1 = nn.Linear(7 * 7 * 64, 1000)

self.fc2 = nn.Linear(1000, 10)


7. Define the forward pass. To customize forward pass functionality, call this function “forward”, to override the base forward function in nn.Module. The second argument x is one batch of data, which is fed into the first neural layer (layer1), then to the next layer, and so on. After layer2, we reshape the data, flattening it from 7 x 7 x 64 into 3164 x 1.

def forward(self, x):

out = self.layer1(x)

out = self.layer2(out)

out = out.reshape(out.size(0), -1)

out = self.drop_out(out)

out = self.fc1(out)

out = self.fc2(out)

return out


8. Define training parameters by creating a ConvNet object and defining:

  • criterion—the loss function. We use the PyTorch CrossEntropyLoss function which combines a SoftMax and cross-entropy loss function.

  • optimizer—we use the Adam optimizer, passing all the parameters from the CNN model we defined earlier, and a learning rate.


model = ConvNet()

criterion = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

9. Train the model, by running two loops:

  • Loop over the number of epochs.

  • Within this loop, iterate over train_loader using enumerate, and do the following:

    • Perform a forward pass, by passing a batch of normalized MNIST images from train_loader to the model object you defined earlier. There is no need to explicitly run the forward function, PyTorch does this automatically when it executes a model.

    • Pass the outputs true image labels to the loss function.

    • Append the loss to a list, which you can use later to plot training progress.

    • In preparation for backpropagation, set gradients to zero by calling zero_grad() on the optimizer.

    • Perform backpropagation using the backward() method of the loss object. Gradients are calculated.

    • Call optimizer.step() to perform Adam optimizer training.


total_step = len(train_loader)

loss_list = []

acc_list = []for epoch in range(num_epochs):for i, (images, labels) in enumerate(train_loader):outputs = model(images)loss = criterion(outputs, labels)loss_list.append(loss.item())optimizer.zero_grad()loss.backward()optimizer.step()

10. Track accuracy (within the same loop) by:

  • Running the torch.max() function, which returns the index of the maximum value in a tensor.

  • In the first argument of the max function, pass a tensor of outputs from the model, which should be of size (batch_size, 10). For each sample in the batch, this will return the maximum value over the 10 output nodes, each representing one of the digits 0-9. For example, output 2 corresponds to digit “2”. The node with the highest output value will be predicted by the model.

  • In the second argument of the max() function, pass 1. This instructs the max function to examine the output node axis (axis=0 corresponds to the batch_size dimension).

  • max() returns a list of prediction integers from the model

  • Compare predictions with the true labels (predicted == labels) and sum them to get the number of correct predictions. Divide by the batch_size to obtain the accuracy.

  • Print progress after every 100 iterations.


total = labels.size(0)

_, predicted = torch.max(outputs.data, 1)

correct = (predicted == labels).sum().item()

acc_list.append(correct / total)if (i + 1) % 100 == 0:print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%'.format(epoch + 1, num_epochs, i + 1, total_step, loss.item(),(correct / total) * 100))# output will look like this:# Epoch [1/6], Step [100/600], Loss: 0.2183, Accuracy: 95.00%

11. To test the model, do the following:

  • Run model.eval()—this runs the model while disabling drop-out or batch normalization layers.

  • torch.no_grad() disables autograd functionality in the model, this is PyTorch’s mechanism for performing backpropagation and calculating gradients, which is not needed in model testing.

  • The remaining code is the same as in the accuracy calculation above, except you are iterating through test_loader and not train_loader.

  • Output the prediction to the console, save model results using torch.save(). This enables graphing the results using a plotting library.


model.eval()

with torch.no_grad():

correct = 0

total = 0

for images, labels in test_loader:

outputs = model(images)

_, predicted = torch.max(outputs.data, 1)

total += labels.size(0)

correct += (predicted == labels).sum().item()print('Test Accuracy of the model on the 10000 test images: {} %'.format((correct / total) * 100))torch.save(model.state_dict(), MODEL_STORE_PATH + 'conv_net_model.ckpt')