MLPs and CNNs do not usually yeild comparable results. The MNIST dataset is very special because it is very clean and perfectly preprocessed. For example, all images have the same size and are centered in a 28x28 pixel grid. It would be a much harder task if the digits were skewed a little or not centered. In the case or real-world messy image data, CNNs will truly SHINE over MLPs.
For some intuition for why this might be the case, in order to feed an image to an MLP, you must first convert the image to a vector. The MLP then treats the image as a simple vector of numbers with no special structure. It has no knowledge of the fact that these numbers are originally spatially arranged in a grid.
CNNs in contrast, were built for the exact same purpose of working with the patterns in multidimensional data. Unlike MLPs, CNNs understand the fact that image pixels that are closer to each other are heavily related than pixels that are far apart from each other.
Create a CNN in Keras by first creating a Sequential
model.
from keras.models import Sequential
Import several layers, including layers that are familiar from neural networks, and new layers that we learned about in this lesson.
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
Add layers to the network by using the .add()
method:
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, padding='same', activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(filters=32, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(filters=64, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Flatten())
model.add(Dense(500, activation='relu'))
model.add(Dense(10, activation='softmax'))
The network begins with a sequence of three convolutional layers, followed by max pooling layers. These first six layers are designed to take the input array of image pixels and convert it to an array where all of the spatial information has been squeezed out, and only information encoding the content of the image remains. The array is then flattened to a vector in the seventh layer of the CNN. It is followed by two dense layers designed to further elucidate the content of the image. The final layer has one entry for each object class in the dataset, and has a softmax activation function, so that it returns probabilities.
Just as with neural networks, we create a CNN in Keras by first creating a Sequential
model.
We add layers to the network by using the .add()
method.
Copy and paste the following code into a Python executable named conv-dims.py
:
from keras.models import Sequential
from keras.layers import Conv2D
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, strides=2, padding='valid',
activation='relu', input_shape=(200, 200, 1)))
model.summary()
We will not train this CNN; instead, we'll use the executable to study how the dimensionality of the convolutional layer changes, as a function of the supplied arguments.
When you run the above block, the output should appear as follows:
Do the dimensions of the convolutional layer line up with your expectations?
Feel free to change the values assigned to the arguments (filters
, kernel_size
, etc).
Take note of how the number of parameters in the convolutional layer changes. This corresponds to the value under Param #
in the printed output. In the figure above, the convolutional layer has 80
parameters.
Also notice how the shape of the convolutional layer changes. This corresponds to the value under Output Shape
in the printed output. In the figure above, None
corresponds to the batch size, and the convolutional layer has a height of 100
, width of 100
, and depth of 16
.
The number of parameters in a convolutional layer depends on the supplied values of filters
, kernel_size
, and input_shape
. Let's define a few variables:
K
- the number of filters in the convolutional layerF
- the height and width of the convolutional filtersD_in
- the depth of the previous layerNotice that K
= filters
, and F
= kernel_size
. Likewise, D_in
is the last value in the input_shape
tuple.
Since there are F*F*D_in
weights per filter, and the convolutional layer is composed of K
filters, the total number of weights in the convolutional layer is K*F*F*D_in
. Since there is one bias term per filter, the convolutional layer has K
biases. Thus, the number of parameters in the convolutional layer is given by K*F*F*D_in + K
.
The shape of a convolutional layer depends on the supplied values of kernel_size
, input_shape
, padding
, and stride
. Let's define a few variables:
K
- the number of filters in the convolutional layerF
- the height and width of the convolutional filtersS
- the stride of the convolutionH_in
- the height of the previous layerW_in
- the width of the previous layerNotice that K
= filters
, F
= kernel_size
, and S
= stride
. Likewise, H_in
and W_in
are the first and second value of the input_shape
tuple, respectively.
The depth of the convolutional layer will always equal the number of filters K
.
If padding = 'same'
, then the spatial dimensions of the convolutional layer are the following:
H_in
) / float(S
))W_in
) / float(S
))If padding = 'valid'
, then the spatial dimensions of the convolutional layer are the following:
H_in
- F
+ 1) / float(S
))W_in
- F
+ 1) / float(S
))