Methods
Our Approach to Implement the Project
Our Approach to Implement the Project
For first part of our project, we implement Neural algorithm with CNN that transfer our input image with an artistic style transfer
One important finding of Neural Algorithm is that the representations of content and style are separable in the CNN. As a result, we are conducting representations independently to produce a image with high perceptual image by trying to minimize the loss function. To be specific, we are going to have a regularization between the content loss of input image and style loss of the reference image.
The CNN model can extract the features of images. Its layers can track different features like the color, edges and corners in an abstract way. Therefore, we only use the layers of high layers as the input features. After having the features of input images and output images, we begin to minimize the value of loss function which is a regularization between the content loss of input image and style loss of the reference image. The two parts of the loss function is defined differently as following:
The very first step of our algorithm is to have a copy of content image as the output image before the first iteration. Finally, in order to generate images that mix style representation and input image's content, we minimize difference of output image image from content representation in one layer and style representation of artwork in chosen layer of CNN.
For second part of our project, we are trying to transfer our images with a realistic reference
One limitation of Neural Algorithm is that it only transfer images successfully with an artistic reference image.
As a result, we modify our algorithm to
We still use the same CNN model as feature extraction and we will modify the loss funtion which is implemented as the first photorealism regualrization. The simple style loss function is updated to an Augmented style loss with semantic segmentation, and content image is same as before. The three parts of the loss function is defined differently as following:
This part is similar to the first part but with different loss function. And we choose the layers of CNN accordingly.
We generate images with semantic segmentation masks for the input and reference images for different objects to separate the content into different parts. We do this manually with PhotoShop to obtained the most accurate segmentation. But the image segmentation can also be done with the thresholding method, KNN method or CNN model. The reference of using CNN to generate the semantic segmentation is listed in reference.
VGG-19 Pre-trained CNN model as feature extractor
Content Layer: conv4_2
Weight: 1
Style Layers: relu1_1, relu2_1, relu3_1, relu4_1, relu5_1
Weight: 0.2, 0.2, 0.2, 0.2, 0.2
VGG-19 Pre-trained CNN model as feature extractor
Content Layer: conv4_2
Weight: 1
Style Layers: conv1_1, conv2_1, conv3_1, conv4_1 and conv5_1
Weight: 0.2, 0.2, 0.2, 0.2, 0.2
Γ = 10^2 , λ = 10^4
We have generate the result with different number of iterations, and find the most compelling result. Here are examples from 100 iterations to 2200 iterations.
Our results are generated on the CPU-only environment. The average running time of Part I is 60 minutes and of part II is 100 minutes.