Document understanding is essential in areas such as invoice extraction, medical record analysis, and legal document processing. While many workshops focus on pure vision-based tasks (OCR, layout analysis) or pure NLP tasks, VINALDO emphasizes the synergistic integration of computer vision and natural language processing for structured information extraction and semantic understanding of documents.
This third edition of VINALDO highlights structured knowledge extraction from documents using multimodal approaches, with a focus on:
Knowledge Graphs (KGs) built from visual and textual cues
Integration of Large Language Models (LLMs) with visual document understanding
Multimodal representation learning for semantic retrieval
Our goal is to move beyond traditional document analysis by exploring how vision and language jointly enable structured, relational understanding particularly in complex documents like invoices, forms, and reports.
Novelty for this edition:
After the success of VINALDO 2023, and VINALDO 2024, in this third edition of the VINALDO workshop, we encourage the description of novel problems or applications for document analysis in the area of information retrieval that has emerged in recent years. In the last edition VINALDO 2024 we highlighted a particular topic namely “Knowledge Graphs and Multimodal approaches”.
In this new edition, we aim to encourage novel and recent research on document analysis including, but not limited to, approaches that intersect with areas such as Large Language Models (LLMs), Knowledge Graphs (KGs), and Natural Language Processing (NLP). The VINALDO workshop focuses on the joint exploitation of visual and textual information for document understanding, while remaining open to a wide range of methods and perspectives.
In particular, we highlight the growing importance of structured representations such as Knowledge Graphs extracted from document context, which are still underexplored despite their relevance across many application domains. We therefore welcome contributions that explore the combination of computer vision, NLP, and structured knowledge representations, as well as works that integrate NLP and vision techniques in innovative ways.
We also encourage submissions that introduce new datasets, benchmarks, or real-world applications related to document analysis. Overall, the VINALDO workshop aims to bring together researchers and practitioners from academia, industry, and applied research to exchange ideas, share experiences, and discuss ongoing challenges and advances in document analysis at the intersection of Computer Vision and NLP