Neural Style Transfer

Figure 1: Starry Night style Transfer [1]

Goals For the Project

The goal of this project is to transform an image from the ArtBench-10 dataset to another style category of the data set. We want to preserve as much content in the original image while revamping its style.

Our project is based on the article "A Neural Algorithm of Artistic Style", and we were provided with a starter code file from "Problem Set 6: Image Synthesis" of EECS 442/504.

System Overview

Style transfer can be achieved by extracting features and styles from layers of a Convolutional Neural Network (CNN). To generate the synthesized image, we start with a white noise image and perform gradient descent to find an image that best represents the features and style of the desired image. The key algorithms involved are called content loss, gram matrix, and style loss.

The CNN model used here is a pre-trained version of SqueezeNet.

Figure 2: Neural style transfer system diagram, v7Labs [2]

Figure 3: Feature Map Demonstration [3]

Feature Map

Convolution is performed in every layer of a CNN. Small image features such as edges are in the first few layers, while global features such as object parts are in the deeper layers.

If we strategically choose the feature map layer, we can obtain information about the subject/content in the image.

Content Loss

If we are given 2 feature maps, one from the content image, and the other from the generated image, we can then use the equation on the right to calculate its content loss.

This equation resembles a Mean Square Error (MSE) function that calculates the squared ℓ2 distance between two convolutional feature maps.

If we then perform the gradient descent to minimize the content loss, we can achieve the goal of preserving the maximum amount of content from the input image.

Example: Too much content loss

The example below shows the effect of too much content loss. In the final transferred image, too little original content is left for people to tell what was in the original content image.

Figure 4: Gram Matrix Visualized [4]

Gram Matrix

We can also analyze the style of an image using the feature maps of a CNN. Think of the "texture" of an image as how often some specific colors and edges appear together.

How can we find correlations? Well, as we learned in class, the dot product can find similarities between the two vectors. If two features often appear together, then their dot product should be a relatively big number.

We can take a similar approach to analyze a matrix, the gram matrix. Again, we start by extracting the feature maps of the desired image from the CNN. Since the feature maps have three dimensions (height, width, and channel), we can simplify later operations by flattening one of its dimensions first. Then by performing matrix multiplication with its transpose to measure the similarity between the rows of the flattened feature map.

Example: Gram Matrix is Symmetric (DSP TOOL USED HERE)

As we learned in class, a matrix multiplied by its transpose should be a symmetrical matrix. We have plotted some gram matrices, and it is apparent that the matrix is symmetrical against its major diagonal.

Style Loss

The style loss is very similar to the content loss in ways that it uses Mean Square Error to calculate the distance between the generated style image with the input style image. This time the difference measured is the style image's gram matrix and the generated image's gram matrix. We can also add a weight parameter to help us easily tune the amount of style we want to preserve in the final generated image.

We can also perform gradient descent to preserve the most style.