Photo-realistic image stylization attempts to transfer style from one image onto a given image. Imagine
going to London after a long semester, but the weather was cloudy. One would like to be able to take photos
of the destination in any preferential scenario - sun, rain or snow. Another scenario would be for real-time
image stylization i.e. filters. This has been a fairly well studied domain. The major challenge of this problem
is to transfer the style onto a generated image that looks like it has been taken by a camera. Rendering
the semantic content of an image in different stylizations seems to be an interesting and difficult computer
vision task with a lot of applications, thus forming the basis of motivation for this project.
Given a content image and a style image, the objective is to transfer the style to the content image, maintaining a photo-realistic appearance.
Images from (Luan et al, 2017)
A number of algorithms exist to transfer the style of one image onto the content of another. Many of these approaches function primarily as artistic tools, distorting fine lines for artistic purposes, and often outputting images with unrealistic artifacts. As such, they are not suitable for producing photo-realistic images (Li et al, 2018). The Neural style transfer algorithm was proposed for artistic sylizations, such as converting photos to the appearance of a painting (Gatys et al, 2016). This approach was not designed to preserve photorealism. It also does not have a closed-form solution, but rather solves an optimization objective to output images. As such, it is relatively slow. Several techniques have been developed to improve stylization performance and speed, such as incorporating a new loss term to the optimization objective to better preserve local structure in content photo. However the approach is still computationally expensive and generates inconsistent stylization with noticeable artifacts. These traditional methods are based on local color or tone matching, but they are limited to specific scenarios, such as seasons or portraits (Li et al, 2018). Li et al proposed a more universal style transfer approach in the form of an autoencoder with whitening and coloring feature transforms (WCT) (Li et al, 2017). This method was effective for artistic stylization but however, like past approaches, it suffers from structural artifacts when applied to photo-realistic images.
To address photo-realistic images, PhotoWCT was proposed as a modification to the WCT algorithms (Li et al, 2018). PhotoWCT includes an unpooling layer in the autoencoder to preserve spatial information in the output image, and adds a smoothing step to remove artifacts. It produces photo-realistic images more effectively than the traditional WCT approach, however the extra smoothing step results in over-smoothing in some cases, and requires additional time.
PhotoNet (An et al 1, 2019) eliminates the need for an additional smoothing step with further modifications to the PhotoWCT algorithm. This method uses a WCT autoencoder, with normalized skip connections from encoder layers to decoder layers, making it possible to preserve details in style transfer, while eliminating the need for an additional smoothing step. The efficiency of this design was improved in later work with PhotoNas (An et al 2, 2019), a network architecture designed through an automated pruning of the PhotoNet network. Both PhotoNas and PhotoNet fail where style images contain several distinct styles, as well as in cases of noisy input images.
WCT2 takes another approach to improving on WCT (Yoo et al, 2019) through the use of Haar wavelet pooling layers and a progressive stylization autoencoder, rather than the multi-level autoencoder favored by other variations of WCT. The use of Haar wavelet pooling layers allows for complete image reconstruction, while the progressive stylization improves upon efficiency, passing images through only one autoencoder, rather than five levels of autoencoders. However, this method still suffers from the reliance on accurate segmentation maps to provide desired output images.
While all of these approaches are effective under many conditions, each has its weaknesses and failure cases, in the form of image artifacts, over-smoothing, or failure to preserve aspects of style. To the best of our knowledge, they also have not been evaluated under broad conditions, such as cases in which style and content images contain significantly different content.
WCT
PhotoWCT
WCT2
Evaluate existing image stylization approaches
Identify areas where existing techniques can be improved upon
Modify existing approaches, and design a hybrid approach to improve upon existing techniques
Evaluate our approaches compared to existing techniques