Results

To get a deeper understanding of our approaches and results please visit 'Problem Statement' and 'Our Approach' sections. A summary of our deductions along with discussion has been added in the 'Conclusion' section.

In this section, we will present -

Comparison of different models in different learning settings
The Images colorized by our best model
Comparison of our best model's closeness score with the state of the art methods
Applying our model on a short video

Note : Lower closeness score translates to better performance

Comparison of different models in different learning settings

Comparison of models in different learning settings

Key Conclusions

Vanilla Unet performs best among all the other models for the same settings
Addition of high level extracted features to bottleneck layer does not yield improved performance
Moderately complex network is needed to generalize well on a test set
Low values of lambda leads to degraded performance

Comparison of different loss functions used for training

Key Conclusions

Using perceptual loss along with per-pixel loss leads to improved performance in all tested cases
Per-pixel loss calculated using L1 loss performs better than L2 loss
Loss function for optimal training is (L1 per-pixel loss + perceptual loss)

Fractional Strided Convolution vs Upsampling

Key Conclusions

Fractional Strided Convolution performs better than upsampling
Fractional Strided Convolution learns the upsampling parameters during training and hence can generate higher quality results
Upsampling is relatively faster than fractional strided convolution as it uses a pre-defined strategy for interpolation

Deep Unet vs Shallow Unet

Key Conclusions

When using a deeper architecture for Unet, we obtained much better results than the shallow counterpart
The performance of deeper network is even better in case of large dataset

Images colorized by our network

Images from test set of 102Flowers dataset

Images from test set of Doraemon cartoon series

Images from test set of peppa pig cartoon series

Comparison with State of the Art methods

Key Conclusions

The method used by Iizuka et al. [6] tries to minimize the loss by choosing dull colors whenever the model is unsure about the object or texture
The method used by Zhang et al. [7] leads to bright and colorful images. The high score of this model is due to its bright choices. If the same is not reflected in the ground truth images then the closeness score becomes high.
As opposed to the other two, our model was trained on 17 flowers dataset. The final score for all the methods has been calculated from the test set of 17 flowers dataset. Due to this reason the results are slightly biased in our favor.

Images produced by our network

Images colorized by Iizuka et al. [6]

Images colorized by Zhang et al. [7]

Application of our model on a short video

To perform colorization, we extracted all the frames from the target grayscale video and applied the colorization model to each frame. Finally, we converted the colored images back into a video.

Problem Statement

Our Approach

Challenges

Conclusion

GitHub Link

Page updated

Google Sites

Report abuse