Grounded Image Editing Request (GIER)

A language-driven image editing dataset that supports local and global editing with unconstraint images and free-form language


Dataset Overview

An overview of the GIER dataset. Each data sample is a triplet of source image, target image and language request. We provide annotations of possible editing operations, operation type (global or local), and the masks corresponding to each local operation. We also provide more possible language requests collected from Photoshop experts and amateurs. The data triplet is crawled from zhopped and reddit, and the photoshop experts are hired from Upwork and amateurs from AMT.


The following figures shows the distribution of different editing operations.

  • Unique image pairs: 6179

  • Average request per image: 4.83

  • Average number of operation per edit: 3.21


GIER.json: contains all image urls, requests, operation annotations.
split.json: contains the train/val/test split. contains all the images. contains all the masks. contains all the features.

The specific usage please refer to our repository


@inproceedings{shi2020benchmark, title={A Benchmark and Baseline for Language-Driven Image Editing}, author={Shi, Jing and Xu, Ning and Bui, Trung and Dernoncourt, Franck and Wen, Zheng and Xu, Chenliang}, booktitle={Proceedings of the Asian Conference on Computer Vision}, year={2020}}


For further question, please contact Jing Shi.