In the past, it was done with the help of domain experts who selected the color scheme for the grayscale image manually. With the improvement in technology, some efforts were made to automate this process such as presented in [8], [12], [13] and [14].
In a method introduced by Levin et al. [12], the user specifies the coloring scheme of a region by scribbling the region with a particular color. The model then simply colors the entire area with the scribbled color. [13] Welsh proposed the colorization technique where based on a reference image, the source image is colored using the correlation of luminescence values. This method worked well but suffered from spatial inconsistency. Therefore, human intervention was required to identify 'swatches' in both source and destination and suggest how the color should be transferred for the same. Irony et al. [14] used a supervised learning method for classifying a feature space to colorize a source grayscale image.
Despite promising results, these techniques are time and resource consuming due to manual intervention.
Owing to the ongoing research in deep learning in the past few decades, many fully automated methods for Image Colorization have been proposed. Most of these methods train a CNN by minimizing a pixel-wise loss over the generated and ground truth image. It has been observed that using L2 loss for the same leads to dimmer images as the network is discouraged to make bold choices. Pixel wise L1 loss shows significant improvement by allowing bolder choices.
Johnson et al. [15] proposes using perceptual loss for the task of style transfer and super resolution. To find the perceptual loss, we compare the features extracted from a pre-trained network (VGG16) on the ground truth and our generated image. We explore this loss along with other losses in our project.
The state-of-the-art techniques utilize CNNs for achieving the task of Image Colorization. The method involves extracting and combining high, medium and low-level features in the encoding phase and then decode these to generate the final colorized image. Methods based on this approach have been explored in various works such as [1], [2], [6] and [7].
In addition to above, there have been other works such as [10] wherein the color mapping was learned. [10] uses a LEARCH framework to train a quadratic objective function in the chromaticity maps. Zhang et al. [16] proposes a deep learning based approach for user-guided image colorization. The network uses user hints combined with learned high level features to colorize the image.
Nazeri et al. [2] proposes a GAN based approach for colorizing the images. GAN introduced by Goodfellow et al. [5] have been instrumental in tasks related to data generation. They operate by training a generator and a discriminator simultaneously wherein both are trying to outwit each other. The problem of Image to Image translation is analogous to the minimax game played by GAN wherein we want the output generated by our model to fool us into believing that it is real. We have used GAN as the base model in all our experiments.
These methods are currently considered to be state of the art.