In [1, 2] the focus is mainly on domain adaptation. In order to shift to another domain, we need to learn a new Cycle GAN each time. Furthermore, [3, 4] do not perform very well for multi-label classification as they do not learn any relationships between any of the labels. They can only deal with a single class at a time. In addition, there is no disentanglement in the latent space, which makes changing specific parts of the image very tricky. Finally, they do not account for context in the image. All images of cats cannot be considered to be the same.
A group of cats
A group of three cats
An angry cat
Cat on a table
Figure 1: Different contexts in cat images