Your First Convolutional Neural Network in Keras


Keras is a high-level deep learning framework which runs on top of TensorFlow, Microsoft Cognitive Toolkit or Theano. It lets you build standard neural network structures with only a few lines of code. To customize and create your own deep learning algorithms, you’ll need to work directly with TensorFlow or another lower-level deep learning framework.

CNN in Keras is based on a sequential model—you define parameters, create a model object and add convolutional layers to it.

Keras CNN Commands Cheat Sheet

Training a CNN on the MNIST Dataset in Keras—a Brief Tutorial


This tutorial will show you how to load the MNIST dataset and, a benchmark deep learning dataset, containing 70,000 handwritten numbers from 0-9, and building a convolutional neural network to classify the handwritten digits. Our discussion is based on the excellent tutorial by Elijaz Allibhai.

Follow these steps to train CNN on MNIST and generate predictions:

1. Load the MNIST dataset and split into train and test sets, with X_train and X_test containing the training and testing images, and y_train and y_test containing the “ground truth” of the digits represented in the images. In the MNIST data set 60,000 images are used for training and 10,000 for testing/validation (learn more about neural network bias and variance in our neural network guide).

from keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

2. View one of the images using the plt.imshow command, and check its size using the .shape function, to understand what the dataset looks like. Try this on several images. As you will see, all the MNIST images are uniformly 28 x 28 pixels in size and contain handwritten digits.

import matplotlib.pyplot as plt

plt.imshow(X_train[0])

X_train[0].shape

3. Reshape the two sets of images, X_train and X_test, to the shape expected by the CNN model. The Keras reshape function takes four arguments: number of training images, pixel size, and image depth—use 1 to indicate a grayscale image.

X_train = X_train.reshape(60000,28,28,1)

X_test = X_test.reshape(10000,28,28,1)

4. Next, you’ll need to ‘one-hot-encode’ the target variable—create a column for each classification category, with each column containing binary values indicating if the current image belongs to that category or not. Because we are classifying digits, there will be 10 columns for digits 0-9, and according to the classification decision, one of the columns will have a 1 (e.g. the column for the digit 3) and the rest will be 0.

Note: If you train with normal class numbers, this will introduce a bias. The model will consider higher value digits (e.g., 9) to be greater than lower value digits (e.g. 1). One-hot-encoding causes the model to treat all digits as equivalent.


from keras.utils import to_categorical

y_train = to_categorical(y_train)

y_test = to_categorical(y_test)

y_train[0]

5. Create a model, using the Sequential model type, which lets you build a model by adding on one layer at a time.

from keras.models import Sequential

from keras.layers import Dense, Conv2D, Flatten

model = Sequential()

6. Add model layers: the first two layers are Conv2D—2-dimensional convolutional layers These are convolution layers that deal with the input images, which are seen as 2-dimensional matrices. The Conv2D function takes four parameters:

  • Number of neural nodes in each layer. We will use 64 for the first convolutional layer and 32 for the second.

  • kernel_size defines the filter size—this is the area in square pixels the model will use to “scan” the image. Kernel size of 3 means the model looks at a square of 3×3 pixels at a time.

  • activation is the type of activation function (click to learn more in our neural network guide) we use after each convolutional layer. For CNN the typical activation function used is ReLu.

  • input_shape is the pixel size of the images and the image depth, again setting 1 for grayscale.


model.add(Conv2D(64, kernel_size=3, activation=’relu’, input_shape=(28,28,1)))

model.add(Conv2D(32, kernel_size=3, activation=’relu’))

Note: Each of the convolution layers reduces the depth, width and height of each feature, this is equivalent to the pooling/downsampling stage in the CNN model. The formula for calculating the output size for any given conv layer is:

O = ((W-K+2P)/S)+1 where O is the output height/length, W is the input height/length, K is the filter size, P is the padding, and S is the stride.

7. Add a ‘Flatten’ layer, which takes the output of the two convolution layers and turns it into a format that can be used by the final, densely connected neural layer.

model.add(Flatten())

8. Add the final layer of type ‘Dense’, a densely-connected neural layer which will generate the final prediction. The Dense function takes two arguments:

  • Number of output nodes—10 in our case because we need to generate predictions for digits between 0-9.

  • Type of activation function for the output layer. We use softmax which is the typical activation function used for neural output layers. Softmax takes the Dense layer output and converts it to meaningful probabilities for each of the digits, which sum up to 1. It then makes a prediction based on the digit that has the highest probability.

model.add(Dense(10, activation=’softmax’))

9. Compile the model. The compile function takes three parameters:

  • optimizer controls the learning rate, which defines how fast optimal weights for the model are calculated (learn more about hyperparameters in our neural network guide). We will use the ‘adam’ learning rate optimizer.

  • loss defines the loss function, which measures how far the model’s prediction is from the ground truth, the correct digits for the images (learn more about loss functions and the backpropagation process). We will use ‘categorical_crossentropy’, a loss function suitable for classification problems.

  • metrics defines how we evaluate model success. We’ll use the ‘accuracy’ metric to calculate an accuracy score on the testing/validation set of images.

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

10. Training the model, using the fit function which takes three parameters:

  • training data (train_X)

  • target data (train_y)

  • validation data

  • number of epochs—number of times the backpropagation process will be run on the training images—we will set this to 3.

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3)

11. The model trains, the output should look like this:

As you can see above, after three epochs, accuracy on the validation set is up to 97.57%. The CNN works well.

12. Now that you see the model is working, you can generate actual predictions using the predict function. The function returns an array of probabilities for each of the 10 possible results (digits 0-9), with the sum of probabilities for each image equal to 1. You can input new, unknown data to the predict function to get a prediction for this data. For now, let’s run a prediction for the first four images in the test set:


model.predict(X_test[:4])

The output will show probabilities for digits 0-9, for each of the 4 images. The model predicts 7, 2, 1 and 0 for the first four images.

13. Compare this with actual results for the first 4 images in the test set:

y_test[:4]

The output shows that the ground truth for the .first four images is also 7, 2,1 and 0—the model made an accurate prediction.

Next Blog