We evaluated seven SOTA IC systems, including Microsoft Azure AI Vision API, GIT, BLIP, BLIP2, Vit-GPT2, OFA and VinVL, and compared SPOLRE with MetaIC and ROME.
We randomly selected 200 images and their corresponding annotations from MSCOCO to form our primary dataset. For MetaIC, we chose a 30% overlap radio as the default value for PO. Both PO and NO modes eventually generated 2,906 samples each. For ROME, we used the default parameter setting and generated 362 image pairs finally. For OLAR, we generated 10 independent layout reconstructions per seed image. Each refined semantic map undergoes a procedure to synthesize five new images. OLAR generated a total of 10,000 test cases. All generated samples can be found here.
Fagure 1: User survey result.
Figure 2: Visualization of image features.
Table 1: Precision of OLAR and Baselines.
Table 2: The Results of Ablation Experiment.