Attention-aware Multi-stroke Style Transfer

CVPR 2019

Yuan Yao, Jianqiang Ren, Xuansong Xie, Weidong Liu, Yong-jin Liu, Jun Wang

Architecture of the proposed AAMS network

Abstract

Neural style transfer has drawn considerable attention from both academic and industrial field. Although visual effect and efficiency have been significantly improved, existing methods are unable to coordinate spatial distribution of visual attention between the content image and stylized image, or render diverse level of detail via different brush strokes.

In this paper, we tackle these limitations by developing an attention-aware multi-stroke style transfer model. We first propose to assemble self-attention mechanism into a style-agnostic reconstruction autoencoder framework, from which the attention map of a content image can be derived. By performing multi-scale style swap on content features and style features, we produce multiple feature maps reflecting different stroke patterns. A flexible fusion strategy is further presented to incorporate the salient characteristics from the attention map, which allows integrating multiple stroke patterns into different spatial regions of the output image harmoniously. We demonstrate the effectiveness of our method, as well as generate comparable stylized images with multiple stroke patterns against the state-of-the-art methods.

Paper: arxiv 1901.05127, 2019

Code: Github Page

Citation:

Yuan Yao, Jianqiang Ren, Xuansong Xie, Weidong Liu, Yong-jin Liu, Jun Wang. "Attention-aware Multi-stroke Style Transfer.", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.


Stylized Result via Different Stroke Sizes

Our multi-scale style swap enables continuous and discriminative stylized patterns by changing the scale coefficient, and further generate integrated results via different combinations efficiently.

Multi-stroke Fusion

In the attention histogram, we mark the attention value of clustering centers obtained by applying k-means on the attention map Aˆc, with higher attention values assigned to finer stroke patterns. The integrated feature map is generated seamlessly as the weighted sum of these stroke feature maps according to our proposed fusion strategy.

Comparison with prior methods

The attention map extracted from the content image enables the seamlessly synthesis among multiple stroke sizes, while demonstrating superior spatial consistency of visual attention between content image and stylized image.