The rapid expansion of scientific publications and visually rich document collections poses unique challenges for researchers and practitioners across various fields. Staying up-to-date with the latest findings and identifying emerging challenges is increasingly difficult, making the development of advanced technologies to streamline document understanding essential. The Workshop on Document Understanding and Intelligence: From Textual Content to Visually-Rich Structure (W25) aims to provide a unique forum for researchers to exchange ideas and to explore cutting-edge methodologies and resources that enable a comprehensive understanding of scholarly and visually structured documents. This workshop unites the research community from diverse disciplines to discuss state-of-the-art technologies and their impact on diverse fields, from scientific research to business, law, and medicine.
Building on the foundations of last year’s Scientific Document Understanding (SDU) workshop, the 2025 workshop broadens its scope to incorporate Visually Rich Document (VRD) understanding. The morning session will focus on scientific document processing, information extraction, question answering, summarisation, and domain-specific applications of large language models (LLMs) and generative AI systems. The afternoon session will explore VRD understanding, with topics covering document structure comprehension, layout parsing, and semantic extraction from complex reports and forms. Through engaging research presentations, invited talks, and a panel discussion, this workshop aims to bridge the gap between textual and visual document processing, fostering interdisciplinary collaborations.
The workshop invites original research contributions on all document understanding and intelligence aspects. Topics of interest include, but are not limited to:
Scientific Document Understanding (SDU)
Information extraction and retrieval for scientific literature
Question answering and generation for scholarly texts
Disambiguation, acronym identification, and definition extraction
Developing LLMs and generative AI models tailored for scientific domains
Instruction tuning, in-context learning, and other adaptive strategies for scientific documents
Document summarization, topic classification, and machine reading comprehension
Multi-modal and multi-lingual scholarly text processing
Knowledge graph construction, representation, and reasoning for scholarly resources
Survey papers on SDU advancements and unsolved challenges in different scientific domains
Visually Rich Document Understanding (VRD)
Semantic extraction and structural parsing for visually rich documents
Table and form understanding, document layout analysis, and diagram comprehension
Multi-modal integration of textual, visual, and tabular data
VRD applications in business, legal, and medical documents
VRD challenges in handling diverse layouts and domain-specific formats
New methodologies and benchmarks for VRD tasks in real-world scenarios
Cross-Domain and Interdisciplinary Topics
Leveraging large language models and generative AI for both textual and visual document processing
AI-based frameworks for document-level analysis and representation learning
Data integration and knowledge management in hybrid text and visual document systems
Factuality, data verification, and anti-science detection in complex document contexts
Resource and tool development, including new datasets, models, and evaluation frameworks for SDU and VRD tasks
The workshop will be a one-day event, with an expected participation of approximately 50-60 attendees. It will commence with an opening remark, followed by research paper presentations focusing on SDU in the morning session. The afternoon session will spotlight recent developments in VRD, featuring a research track and a leaderboard track dedicated to structural understanding in industrial reports. The workshop will conclude with a panel discussion, bringing together researchers from academia and industry to identify future directions and research gaps in document understanding.
We are excited to include a leaderboard track for the VRD tasks this year, offering participants a chance to showcase their methods. More detailed information about the competition can be found on the workshop page.
Submission Guidelines
We welcome submissions of unpublished, original research that presents novel findings and perspectives. Submissions should be in English and follow the AAAI style template. Authors may also submit supplementary materials, including technical appendices, source codes, datasets, or multimedia appendices. All submissions will undergo double-blind peer review, and the accepted papers will be presented as oral or poster presentations at the workshop. At least one author of each accepted paper must register and attend the workshop to present their work.
We encourage two types of submissions:
Long Technical Papers: Recommended length of up to 8 pages + references.
Short Papers: Recommended length between 3 and 5 pages + references.
Submissions should be made electronically in PDF format via the [Microsoft CMT](https://cmt3.research.microsoft.com/DOCUIAAAI2025/Submission/Index) system. The submission link, deadline and other important dates will be announced on the workshop’s official webpage.
Morning Session (Scientific Document Understanding)
Mihir Parmar, Arizona State University, USA
Thien Huu Nguyen, University of Oregon, USA
Chien Van Nguyen, University of Oregon, USA
Ryan A. Rossi, Adobe Research, USA
Franck Dernoncourt, Adobe Research, USA
Afternoon Session (Visually-Rich Document Understanding)
Caren Han, The University of Melbourne, Australia
Yihao Ding, The University of Melbourne, Australia
Josiah Poon, The University of Sydney, Australia
Anita de Waard, Elsevier, Netherlands
Eduard Hovy, The University of Melbourne, Australia
December 4, 2024 --> December 10, 2024: Workshop Submission Deadline
December 14, 2024 --> December 16, 2024: Notifications Sent to Authors
December 19, 2024: Early Registration Deadline
March 3-4, 2025: AAAI-25 Workshop Program