Convolutional Neural Network

Network Architecture

Figure 1: VGG-16 CNN Architecture [1]

VGG-16

The network we implemented is called VGG-16. There are 16 layers in total with a total of more than 134 million parameters.

The convolutional layers detect the features of the images while the fully connected layers classify images based on the extracted features.

Original paper [2]

>> DSP Tool: 2D convolution

2D convolution is performed in every convolutional layer. With a convolution kernel of 3x3, VGG-16 looks for small details in the image, such as edges, in the first few convolutional layers. The final few convolutional layers look for global features in the image. Since the convolutional kernels are learnable parameters, the model will try to find the optimal features in the images to yield a high correct rate.

Pre-processing

Figure 2: Normalized Image VS Original Image

Data Normalization

The mean and standard deviation of the dataset for each of the red, green, blue channels was calculated over the entire training set.

This step is necessary because it can improve the accuracy of the CNN by removing outliers, the too bright and too dark images, in the dataset.

After calculating the mean and standard deviation, Z-score normalization transform was applied.

Loss Function and Optimizer

Before the training phase, the training dataset is randomly divided into mini-batches. During training, the model iterate through the mini-batches and try to predict the labels of each image in the batch. Upon comparing the predicted labels with the target labels, the network tries to improve its predictions by minimizing the cross entropy loss(equation shown below).

Figure 3: Cross Entropy Loss From PyTorch [3]

To reach the minimum of the loss function in the fastest way, the model preforms a stochastic gradient descent. Essentially, the stochastic gradient decent finds the direction that minimizes the loss function and it nudge the model to better fit the correct output labels. We also added a small amount of momentum to prevent the model from having too much oscillation, thus helping us to reach the minimum in a shorter amount of time.

Figure 4: Stochastic gradient descent [4]

Training

What We Trained

A small VGG-styled neural network(mini-VGG)

3 conv layer + max pool, 1 avgpool, 2 fully connected layer, 3.7 million parameters
Trained for 3 epochs
Validated on testing set with accuracy of 40%

VGG-16

13 conv layer + max pool, 1 avgpool, 2 fully connected layer, 134.3 million parameters
Trained for 50 epochs with batch size = 16
- Best test set accuracy at 19th epoch, 53.9058%
Trained for 30 epochs with batch size = 32
- Best test set accuracy at 30th epoch, 52.3289%

Mini-VGG Results

Mini-VGG performed really well given that it was only trained for 3 epochs. The following graph plots the batch accuracy moving average with a window of 100 batches(in red line) against the batch accuracy(in blue dots). Since the model was trained with a batch size of 32, we can observe the quantized resolution of the batch accuracies. In addition, we can observe the frequency of a certain batch accuracy over time by looking at the density of the dots.

Figure 5: Batch Accuracy of mini-VGG

>> DSP Tool: Moving Average Filter

The batch accuracy vector is convolved with a window function to get the moving average of training accuracy.

VGG-16 Results

With a deeper convolutional layer, VGG-16 out performs mini-VGG in terms of accuracy. The following graph plots the batch accuracy moving average(in red line), batch accuracy(in blue dots), and the test evaluation accuracy(in yellow line). In both batch size of 32 and batch size of 16, we observe overfitting because the test evaluation accuracy diverges from the batch accuracy moving average. Overfitting can happen in the later epochs where the model starts to memorize the answers of the training set instead of generalizing to answers to the unseen data as well.

Figure 6: Batch Accuracy of VGG-16 Batch Size of 32

Figure 7: Batch Accuracy of VGG-16 Batch Size of 16

Comparisons - Mini-VGG VS VGG-16

VGG-16 consistently wins over mini-VGG during training. After 3 epochs, VGG-16(batch size = 32) has a test validation accuracy of 45% compared to 40% for mini-VGG. The following graph compares the batch accuracy moving average of VGG-16(red line), and mini-VGG(blue line).

Figure 8: Batch Accuracy of miniVGG VS VGG-16

Comparisons - Batch Size (Hyperparameter Tuning)

From observation, a larger batch size leads to higher accuracy in a shorter amount of time. However, a larger batch size is also more susceptible to overfitting. Although the test evaluation for batch size 32 is higher in the first few epochs, batch size 16 eventually overtakes it in accuracy for all the later epochs. Moreover, the best test validation accuracy of batch size 32 is more than 1.5% lower than that of batch size 16. The graph below shows the batch moving accuracy of batch size 32(red line), and batch size 16(blue line); the test evaluation accuracy of batch size 32(purple line), and bath size 16(yellow line).

Figure 9: VGG-16 Batch Size Comparison

Conclusion

CNNs are significantly(~10%) better than SVM in classifying art styles. However, this improvement in accuracy comes with a cost in training time. Each epoch of VGG-16 takes around 15 minutes to train while SVM takes 16 minutes to train in total. Another advantage of CNN over SVM is that CNN does not require the manual selection of features which is more friendly to developers.

The confusion matrix for mini-VGG(left), and BGG-16 batch size 16(right) indicate the actual label in each row and the predicted label in each column. The most confused categories are impressionism and post-impressionism.

Figure 10: Confusion Matrix of mini-VGG

Figure 11: Confusion Matrix of VGG-16 Batch Size 16

Impressionism VS Post Impressionism

So what is the difference between impressionism and post-impressionism? As the name suggests, post-impressionism came after impressionism. Both utilize oil paint as the main method and depict the world in realistic lighting. Because of the similarities, art experts also have trouble differentiating between the two artistic styles -- "post-impressionist paintings are sometimes hard to recognize without knowledge of the specific year and location in which they were done"[5]. Therefore, there is no intrinsic art style difference between these two, rather, it is defined by the cultural context in which the artworks are created. Below are some random samples of each from art bench, can you spot the difference?

Impressionism

Figure 12: Four Random Selected Impressionism Paintings

Post-Impressionism

Figure 13: Four Random Selected Post-Impressionism Paintings

Page written by: Joshua Ning

Page updated

Report abuse