MNIST Digit Recognition

MLP vs. CNNs

MLPs and CNNs do not usually yeild comparable results. The MNIST dataset is very special because it is very clean and perfectly preprocessed. For example, all images have the same size and are centered in a 28x28 pixel grid. It would be a much harder task if the digits were skewed a little or not centered. In the case or real-world messy image data, CNNs will truly SHINE over MLPs.

Now let's solve this problem twice:

Using MLP (Multi-Layer Perceptron). Aka fully connected neural network
Using Convolutional Neural Network

MLP

For some intuition for why this might be the case, in order to feed an image to an MLP, you must first convert the image to a vector. The MLP then treats the image as a simple vector of numbers with no special structure. It has no knowledge of the fact that these numbers are originally spatially arranged in a grid.

CNN

CNNs in contrast, were built for the exact same purpose of working with the patterns in multidimensional data. Unlike MLPs, CNNs understand the fact that image pixels that are closer to each other are heavily related than pixels that are far apart from each other.

CNN in Keras

Create a CNN in Keras by first creating a Sequential model.

from keras.models import Sequential

Import several layers, including layers that are familiar from neural networks, and new layers that we learned about in this lesson.

from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

Add layers to the network by using the .add() method:

model = Sequential()

model.add(Conv2D(filters=16, kernel_size=2, padding='same', activation='relu', input_shape=(32, 32, 3)))

model.add(MaxPooling2D(pool_size=2))

model.add(Conv2D(filters=32, kernel_size=2, padding='same', activation='relu'))

model.add(MaxPooling2D(pool_size=2))

model.add(Conv2D(filters=64, kernel_size=2, padding='same', activation='relu'))

model.add(MaxPooling2D(pool_size=2))

model.add(Flatten())

model.add(Dense(500, activation='relu'))

model.add(Dense(10, activation='softmax'))

The network begins with a sequence of three convolutional layers, followed by max pooling layers. These first six layers are designed to take the input array of image pixels and convert it to an array where all of the spatial information has been squeezed out, and only information encoding the content of the image remains. The array is then flattened to a vector in the seventh layer of the CNN. It is followed by two dense layers designed to further elucidate the content of the image. The final layer has one entry for each object class in the dataset, and has a softmax activation function, so that it returns probabilities.

CNN Dimensionality

Just as with neural networks, we create a CNN in Keras by first creating a Sequential model.

We add layers to the network by using the .add() method.

Copy and paste the following code into a Python executable named conv-dims.py:

from keras.models import Sequential

from keras.layers import Conv2D

model = Sequential()

model.add(Conv2D(filters=16, kernel_size=2, strides=2, padding='valid',

    activation='relu', input_shape=(200, 200, 1)))

model.summary()

We will not train this CNN; instead, we'll use the executable to study how the dimensionality of the convolutional layer changes, as a function of the supplied arguments.

When you run the above block, the output should appear as follows:

Do the dimensions of the convolutional layer line up with your expectations?

Feel free to change the values assigned to the arguments (filters, kernel_size, etc).

Take note of how the number of parameters in the convolutional layer changes. This corresponds to the value under Param # in the printed output. In the figure above, the convolutional layer has 80 parameters.

Also notice how the shape of the convolutional layer changes. This corresponds to the value under Output Shape in the printed output. In the figure above, None corresponds to the batch size, and the convolutional layer has a height of 100, width of 100, and depth of 16.

Formula: Number of Parameters in a Convolutional Layer

The number of parameters in a convolutional layer depends on the supplied values of filters, kernel_size, and input_shape. Let's define a few variables:

K - the number of filters in the convolutional layer
F - the height and width of the convolutional filters
D_in - the depth of the previous layer

Notice that K = filters, and F = kernel_size. Likewise, D_in is the last value in the input_shapetuple.

Since there are F*F*D_in weights per filter, and the convolutional layer is composed of K filters, the total number of weights in the convolutional layer is K*F*F*D_in. Since there is one bias term per filter, the convolutional layer has K biases. Thus, the number of parameters in the convolutional layer is given by K*F*F*D_in + K.

Formula: Shape of a Convolutional Layer

The shape of a convolutional layer depends on the supplied values of kernel_size, input_shape, padding, and stride. Let's define a few variables:

K - the number of filters in the convolutional layer
F - the height and width of the convolutional filters
S - the stride of the convolution
H_in - the height of the previous layer
W_in - the width of the previous layer

Notice that K = filters, F = kernel_size, and S = stride. Likewise, H_in and W_in are the first and second value of the input_shape tuple, respectively.

The depth of the convolutional layer will always equal the number of filters K.

If padding = 'same', then the spatial dimensions of the convolutional layer are the following:

height = ceil(float(H_in) / float(S))
width = ceil(float(W_in) / float(S))

If padding = 'valid', then the spatial dimensions of the convolutional layer are the following:

height = ceil(float(H_in - F + 1) / float(S))
width = ceil(float(W_in - F + 1) / float(S))

Google Sites

Report abuse