Update: results from our new losses.
(Note: for results with FaceGAN, scroll down)
In the Gifs below, you can see the results of pose and appearance transfer (Right) from the Source image (Left) to the Target Video (Middle)
I am working on this research with my superviser Dr. Koteswar Rao. For a brief overview of what I have achieved till now, please look at the images below.
The inputs are the source image and the target pose semantic segmentation mask (not shown). The output is the generated image. In the face enhancement module, we improve the face generation by including facial keypoints.
The biggest issue in achieving pose and appearance transfer simultaneously is that creating input-output pairs of the appearance transfer module is really hard. One can dress up individual subjects in different clothings and use that dataset for training, but no such elaborate dataset like that exists. Due to this reason, previous works did either one of pose transfer or appearance transfer at a time. We however make use of the existing datasets without such input-output pairs, and to achieve that we have formulated two different methods:
The first method is pose and appearance transfer using an exemplar semantic image. First we segment the body parts into discrete simple shapes, thus allowing the user to easily manipulate the body part by stretching, rotating, warping, changing its colors etc. and expect to see the same transformations in the real image of the person. For facilitating the appearance transfer, we cleverly make sure that the loss function drives the model into learning the colors present in the semantic image only, and retain the features such as patterns * on the clothings from the original image. You can see more about this loss function in our paper (coming soon!).
The second method is pose and appearance transfer using cyclic consistency loss. To implement this, along with the loss mentioned above, we also use a cyclic consistency loss [Zhu et al.] which allows for an unsupervised training, in our case for the appearance transfer where there is no input-output pairs. This allows for a much better reconstruction and retention of the identity of the person in the original image, while transforming and changing the appearance according to the input exemplar.
*The figure below demonstrates how the appearance is manipulated while retaining the original identity of the person. We use the DeepFashion dataset by Liu et al. for testing this. Note how the model doesn't simply change the hue/saturation of the whole image, but actually learns to recolor the specified parts of the body, while retaining the original identity of the person.
For the semantic exemplar image, we used the densepose segments. Samples are shown below. We can easily switch and change colors of any segment and expect to see the same change in the realistic output image