Automatic Photo Adjustment Using Deep Neural Networks

Zhicheng Yan, Hao Zhang, Baoyuan Wang, Sylvain Paris, Yizhou Yu

University of Illinois at Urbana Champaign
Carnegie Mellon University
Microsoft Research
Adobe Research
The University of Hong Kong and University of Illinois at Urbana Champaign

                                 (a) Input image                                                                 (b) Adjusted result by our method                                                      (c) Ground truth adjusted by the photographer

Photo retouching enables photographers to invoke dramatic visual impressions by artistically enhancing their photos through stylistic color and tone adjustments. However, it is also a time-consuming and challenging task that requires advanced skills beyond the abilities of casual photographers. Using an automated algorithm is an appealing alternative to manual work but such an algorithm faces many hurdles. Many photographic styles rely on subtle adjustments that depend on the image content and even its semantics. Further, these adjustments are often spatially varying. Because of these characteristics, existing automatic algorithms are still limited and cover only a subset of these challenges. Recently, deep machine learning has shown unique abilities to address hard problems that resisted machine algorithms for long. This motivated us to explore the use of deep learning in the context of photo editing. In this paper, we explain how to formulate the automatic photo adjustment problem in a way suitable for this approach. We also introduce an image descriptor that accounts for the local semantics of an image. Our experiments demonstrate that our deep learning formulation applied using these descriptors successfully capture sophisticated photographic styles. In particular and unlike previous techniques, it can model local adjustments that depend on the image semantics. We show on several examples that this yields results that are qualitatively and quantitatively better than previous work. 

Deep Feed-Forward Neural Network

The architecture of our DNN. The neurons above the dash line indicate how we compute the cost function in (3). Note that the weights for the connections between the blue neurons and the yellow neurons are just the elements of the quadratic color basis, and the activation function in the yellow and purple neurons is the identity function. During training, error backpropagation starts from the output layer, as the connection weights above the dash line have already been fixed.

Adjustment On Novel Images

Top: an example of the Watercolor effect. Bottom: an example of the Local Xpro effect. In each example, (a): input image. (b): our enhanced result. (c): ground truth. (d)the most similar training images to the input image.

Multiscale Contextual Feature

Our multiscale spatial pooling schema. In each pooling region, we compute a histogram of semantic categories. The shown three-scale scheme has 9*2+1=19 pooling regions. In our experiments, we use a four-scale scheme with 28 pooling regions.

Three Stylistic Local Effects

Row (a): input images. Row (b)&(c): our enhanced results and the groundtruth for the Foreground Pop-Out effect. Row (d)&(e): our enhanced results and the groundtruth for the Local Xpro effect. Row (f)&(g): our enhanced results and the groundtruth for the Watercolor effect.