Divide & Bind Your Attention 

for Improved 

Generative Semantic Nursing

Yumeng Li1,2       Margret Keuper2,3    Dan Zhang1,4    Anna Khoreva1,4

1Bosch Center for AI  2Siegen University  3MPI for Informatics  4Tübingen University

BMVC 2023 Oral

[Paper]       [Code]

Our Divide & Bind can significantly improve a pretrained text-to-image model, faithfully generate multiple objects based on detailed textual description. Compared to prior state-of-the-art semantic nursing technique for text-to-image synthesis, Attend & Excite, our approach exhibits superior alignment with the input prompt and maintain a higher level of realism.

Method Overview

We perform latent optimization on-the-fly based on the attention maps, without fine-tuning the pretrained text-to-image model. We propose two novel loss terms: (1) Total-variation based attendance loss (2) Jensen–Shannon divergence based binding loss.

More Results

(Best view on laptop)

Cross Attention Visualization

Divide for Attendance

With more complex prompts, the competition between tokens becomes more severe. We propose to maximize total variation(TV) of the object tokens to foster the stimulation of multiple excitation, which reduces the risk of conflicts with the other tokens.

"A dog and a turtle on the street, snowy scene"

Stable Diffusion

Attend & Excite

Divide &Bind (Ours)

"A pineapple and two oranges"

Stable Diffusion

Attend & Excite

Divide &Bind (Ours)

Attribute Binding Regularization

We explicitly minimize JS divergence between the attention maps of the object token and its attribute token. By applying our binding loss, the attribute attention map is more localized.

Combined with ControlNet

We combined Divide & Bind with conditional text-to-image model, i.e., ControlNet, which introduces additional condition, e.g., semantic label map. By applying our optimization, the generated images can better follow the conditional inputs.

BibTex

@inproceedings{Li2023divide,

  title={Divide \& Bind Your Attention for Improved Generative Semantic Nursing},

  author={Li, Yumeng and Keuper, Margret and Zhang, Dan and Khoreva, Anna},

  booktitle={34th British Machine Vision Conference 2023, {BMVC} 2023},

  year={2023}