Progress Report 4/5/2023
SVM
Image Pre-Processing: extract important features
From RGB to HSV:
Hue, Saturation, Value (HSV) is another way to represent a image. rgb2hsv() returns HSV representation of the image.
Extract Dominant Colors
Edge Length
Edges are detected using Canny Edge Detection. Canny method gives the number of relatively sharp points (lines) in the image.
Sharpness
In order to determine how sharp the image is, we estimate the sharpness of the image from the image gradient. Below is the comparison between a sharper image and a blurry image.
2D-DFT
2D-DFT are computed by converting the images to gray scale. Originally, with RGB matrix, we have the image in shape of (256, 256, 3). After converting it to gray scale by fft2(), the shape becomes (256, 256, 1). However, there are significant lines on 2D-DFT only when there are sinusoidal oscillations in the grayscale image. In our case, the art images do not show much oscillations, as shown below.
Histograms
Color Histogram
Gray Histogram
SVM: data fitting and make predictions
Trained by 50000 images, and test the model on 10000 images. The Confusion Matrix is shown below.
Pre-processing
Image Normalization
The images are normalized with the mean and standard deviation for the entire dataset to improve the accuracy of the CNN and to prevent overflow.
The Neural Net Model
VGG-BN
A version of Visual Geometry Group - Batch Normalized(VGG-BN) is implemented.
The current challenge is making sure that there is no error in the algorithms before training the neural network. Since training will take a long time, we really want to make sure that everything is working as expected before training. This challenge can be resolved by having design reviews with the professor and the GSI.
Current Progress
The CNN model was trained for 3 epochs, reaching an accuracy of 40% on the testing data.
On the left is the confusion matrix, showing the probability of correct output given each category.
On the right is the running correctness during the training phase. The accuracy seem to be asymptotically approaching 40~50%. This issue could be solved by implementing a deeper convolutional layer in hope to pick up more details in the images.
Since the running accuracy is re-calculated every epoch, severe oscillation in the accuracy is observed at the beginning of each epoch. In addition, the accuracy improvement of the VGG toward the end of each epoch is better represented by the beginning of the next epoch.
We can take a similar approach to analyze a matrix, the gram matrix. Again, we start by extracting the feature maps of the desired image from the CNN. Since the feature maps have three dimensions (height, width, and channel), we can simplify later operations by flattening one of its dimensions first. Then by performing matrix multiplication with its transpose to measure the similarity between the rows of the flattened feature map.
If we are given 2 feature maps, one from the content image, and the other from the generated image, we can then use the equation on the right to calculate its content loss.
This equation resembles a Mean Square Error (MSE) function that calculates the squared ℓ2 distance between two convolutional feature maps.
If we then perform the gradient descent to minimize the content loss, we can achieve the goal of preserving the maximum amount of content from the input image.
Gram Matrix Visualized
The style loss is very similar to the content loss in ways that it uses Mean Square Error to calculate the distance between the generated style image with the input style image. This time the difference measured is the style image's gram matrix and the generated image's gram matrix. We can also add a weight parameter to help us easily tune the amount of style we want to preserve in the final generated image.
We can also perform gradient descent to preserve the most style.
Style Transfer
The team has very little experience with Generative Adversarial Networks (GAN). We started by reading over slides from EECS 504 and obtaining more knowledge through research papers and documents. So far, our group developed a general understanding of concepts like the gram matrix, Content Loss, and Style Loss. Based on our understanding of how GAN works, we chose a few images from ArtBench with distinct styles for the training of our GAN. We obtained A simple starter code from our GSI, Javier, and started the implementation of a few functions.
The challenge that we are facing right now is the inability to determine the correctness of the developed algorithms. We will check with Javier to make sure that our current understanding of GAN is accurate before implementing other functions.
What we've learned so far
We learned basic ideas about SVM, GAN, and CNN, and how to implement them in our program. We applied pre-processing techniques with the DSP tools that we learned in class, such as convolution, matrix representation, and change of basis. We gained experience with using machine learning libraries such as PyTorch.
Plans for the next step
Design review CNN with the GSI and professor, then start training a deeper version of VGG-BN with more convolutional layers.
GAN: After consulting with the GSI, we will revise our implementation of the essential functions like gram matrix, Content Loss and Style Loss, to make it more efficient to run. Then start training immediately.
Refine the input of the SVM, analyze the confusion matrix to see possible areas of improvement. We shall temporarily remove those art-styles that are often confuses the SVM model, and see how the SVM model work with the rest images. For those temporarily removed art-styles, we can develop additional SVM model for them separately. We should further investigate how different features affect classification results.