Workshop on Graphic Design Understanding and Generation 2025

Oct 19, 2025, 1:00 pm - 5:00 pm, Room 328

GDUG Workshop in conjunction with ICCV2025

The workshop on Graphic Design Understanding and Generation (GDUG) aims to bring together researchers, creators, and practitioners to discuss the important concepts, technical perspectives, limitations, and ethical considerations surrounding recognition and generative approaches to graphic design and documents. While recent advances in generative AI are making impressive strides in creative domains, there is a disconnect between research attempts and the real-world workflow that involves graphics design, such as the creation of a website, posters, online advertisements, social media posts, infographics, or presentation slides, where creators do not paint pixels but instead work with structured documents, such as layered object representation, stylistic attributes, and typography. In addition, despite the richness of what humans perceive from visual presentation, there is no universal metric for evaluating the quality of graphic design.

We welcome topics related but not limited to the following:

Topics

Multi-modal document understanding and generation
Font and typography analysis and generation
Layout analysis and generation
Attribute-based styling and colorization
Differentiable rasterization and its applications
Graphic design dataset and perceptual evaluation metrics
AI-assisted design authoring tools and CAD applications
Technology to protect copyright

Submission

The GDUG workshop has two tracks, each with different deadlines. Carefully read through the following instructions.

Proceedings track

Submission to the proceedings track must be in the full paper format and will be peer reviewed. The format is 5-8 pages, excluding references, in the ICCV 2025 format. Papers will be peer-reviewed in a single-blind fashion. Accepted papers will be included in the ICCV workshop proceedings and will have a poster presentation at the workshop. We welcome both novel work and work in progress that has not been published elsewhere, but authors should be aware that papers with more than four pages can conflict with the dual-submission policy of other venues like the ICCV main conference.

Submission site

https://openreview.net/group?id=thecvf.com/ICCV/2025/Workshop/GDUG

Upon submission, be sure to set up the OpenReview profile well before the deadline. New OpenReview profiles created without an institutional email will go through a moderation process that can take up to two weeks.

Important dates

11:59 PM, HST

Paper submission: July 2
Notification to authors: July 11
Camera-ready submission: August 18 (11:59 PM, PDT)
Workshop: October 19

Non-proceedings track

Submissions to the non-proceedings track can be either extended abstracts (less than four pages) or full papers, excluding references, in the ICCV 2025 format. Papers will not be peer-reviewed, but a jury of organizers will select them based on their topic match and the minimum quality requirement to the workshop. Accepted papers will have a poster presentation at the workshop. We welcome novel work, work in progress, or recently published work in another venue (e.g., the ICCV main conference) that has a relevant topic to the workshop.

Although we accept published work for presentation, authors should be aware that papers longer than four pages can conflict with the dual-submission policy of other venues. Check the submission policy of the other venue if in doubt.

Submission site

https://forms.gle/qu7eeC4JyyEi4BMc6

Important dates

11:59 PM, HST

Paper submission: August 18 August 22
Notification to authors: August 25
Workshop: October 19

Invited speakers

Program

The workshop will take place on October 19 in the afternoon, in Room 328. The program is as follows.

1:00 pm Opening [slides]
1:10 pm Invited talk 1
- Typeface Analytics [slides]
  Seiichi Uchida (Kyushu University)
  Typefaces are remarkably diverse, and their numbers are rapidly increasing --- now further accelerated by generative AIs. An interesting observation is that particular typeface styles tend to recur in particular contexts (genres, media, words, objects). The reasons for these pairings remain underexplored; possible factors include perceived impression or mood, tradition, and legibility. This talk sketches a research agenda for investigating real-world typeface use. Leveraging today’s richer corpora and use cases, we analyze patterns of style usage and present several illustrative examples as a basis for discussing directions for future research.
1:35 pm Invited talk 2
- Towards Multimodal Systems that See, Think, and Design
  Sai Rajeswar Mudumba (Servicenow)
  While scaling laws and large language models (LLMs) have unlocked impressive automation, enabling decision-making and design from pixels alone remains a challenge. Such systems must master complex multimodal perception: interpreting infographics, reasoning over document screenshots, and generating coherent graphics code. Building this capability requires advances in multimodal representation learning, grounding, and reasoning. In this talk, I present our coordinated effort spanning permissive multimodal dataset creation, architectural innovation, evals, and reinforcement learning for adaptive thinking. Together, these contributions lay the foundation for systems that can seamlessly understand, reason over visual and textual content, bridging the gap between visual comprehension and creative synthesis. We believe that despite recent progress, interpreting pixels in open-ended, ambiguous contexts remains a grand challenge, demanding breakthrough innovations to realize the promise of multimodal intelligence.
2:00 pm Short break
2:10 pm Invited talk 3
- Evaluating the Quality of Generated Designs: From Rubrics to Automation
  Gökhan Yildirim (Canva)
  Generative AI is rapidly transforming the creative process, with new design generation papers and workflows emerging every day. This shift highlights a need for reliable design evaluation tools to track progress, compare results, and identify the most effective creative workflows. To address this need, we developed evaluation rubrics in collaboration with professional graphic designers, which evolved into an ELO-based arena for comparing design generation workflows across multiple dimensions. I will discuss how we then automated these evaluations using external APIs like ChatGPT alongside our custom Visual Quality Model (VQM), achieving over 80% alignment with human judgments. I will conclude by highlighting applications of VQM beyond evaluation, such as feedback/guidance for generative models.
2:35 pm Invited talk 4
- Editing documents using natural language instructions
  Vlad Morariu (Adobe Research)
  Recent advancements in multi-modal modeling have enabled AI assisted document editing, where users provide natural language instructions describing their intent and a system automatically carries out the desired edits on their behalf. I will describe our efforts to formalize the document editing task through the DocEdit dataset and develop a model to carry out the edits, as well as follow-up work to address the challenges we encountered along the way. These improvements explore the use of LLMs, agents, and improved multi-modal grounding, as well as techniques for automatically evaluating document quality and generating editing instructions to further automate document editing. I conclude by identifying trends and unresolved challenges for future work.
3:00 pm Coffee break
3:30 pm Invited talk 5 (Canceled)
3:45 pm Invited talk 6
- From Flat to Layered: Advancing Graphic Design Generation with Layer Decomposition
  Jingye Chen (HKUST)
  The field of graphic design generation has seen remarkable progress, driven by advances in text rendering, sophisticated architectural design, and the scaling of data. Yet, most existing approaches generate flat, single-layer outputs, which differs to the inherently layered nature of real design processes. In practice, graphic designs are composed of multiple native layers, such as text, foreground elements, and backgrounds. In this talk, I will first review recent methods for layered graphic design generation. I will then present our work, Accordion, a pipeline that performs extraction planning and decomposes designs into layers using tools such as SAM and inpainting. Finally, I will highlight several open challenges and research opportunities for this emerging area.
4:10 pm Short break
4:20 pm Poster session (Exhibit Hall II, poster boards #129 - #135)

MG-Gen: Single Image to Motion Graphics Generation [arXiv] #129
Takahiro Shirakawa, Tomoyuki Suzuki, Takuto Narumoto, Daichi Haraguchi
Embedding Font Impression Word Tags Based on Co-occurrence [arXiv] #130
Yugo Kubota, Seiichi Uchida
LayerD: Decomposing Raster Graphic Designs into Layers [ICCV 2025] #131
Tomoyuki Suzuki (CyberAgent), Kang-Jun Liu (Tohoku University), Naoto Inoue (CyberAgent), Kota Yamaguchi (CyberAgent)
ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation [arXiv] #132
Jovana Kondic (MIT), Pengyuan Li (IBM Research), Dhiraj Joshi (IBM Research), Zexue He (MIT-IBM Watson AI Labs), Shafiq Abedin (IBM Research), Jennifer Sun (MIT), Ben Wiesel (IBM Research), Eli Schwartz (IBM Research), Ahmed Nassar (IBM Research), Bo Wu (MIT-IBM Watson AI Labs, IBM Research), Assaf Arbelle (IBM Research), Aude Oliva (MIT, MIT-IBM Watson AI Labs), Dan Gutfreund (MIT-IBM Watson AI Labs, IBM Research), Leonid Karlinsky (MIT-IBM Watson AI Labs, IBM Research), Rogerio Feris (MIT-IBM Watson AI Labs, IBM Research)
RouteExtract: A Modular Pipeline for Extracting Routes from Paper Maps [arXiv] #133
Bjoern Kremser (Technical University of Munich, The University of Tokyo), Yusuke Matsui (The University of Tokyo)
MUSE: A Training-free Multimodal Unified Semantic Embedder for Structure-Aware Retrieval of Scalable Vector Graphics and Images #134
Kyeong Seon Kim (KAIST), Baek Seong-Eun (POSTECH), Lee Jung-Mok (POSTECH), Tae-Hyun Oh (KAIST)
FASTER: A Font-Agnostic Scene Text Editing and Rendering framework [WACV 2025] #135
Aloy Das (Indian Statistical Institute), Sanket Biswas (Universitat Aut`onoma de Barcelona), Prasun Roy (University of Technology Sydney), Subhankar Ghosh (University of Technology Sydney), Umapada Pal (ndian Statistical Institute), Michael Blumenstein (University of Technology Sydney), Josep Llad´os (Universitat Aut`onoma de Barcelona), Saumik Bhattacharya (Indian Institute of Technology, Kharagpur)