VisionDocs @ ICCV2025

Schedule

Workshop date: October 20th, 2025 afternoon (1-5pm)

Workshop Location: Honolulu Convention Center, 308 A

1:00-1:15pm Opening Remarks🤗
1:15-1:45pm Invited Talk: Simone Marinai - "Will Large Language Models Make Document Analysis a Solved Problem? Insights from Table and Graphics Understanding"
1:45-2:15pm Invited Talk: Andreas Fischer - "Explainable Document Analysis for Domain Experts: Combining Deep Learning with Rule-Based Methods"
2:15-2:45pm Invited Talk: Marcus Liwicki - "Does our research make ourselves obsolete? Or are we wanted more than ever before?"
2:45-3:00pm Oral Presentation and Best paper award! 🏆
3:00-4:00pm Coffee Break and Poster Session (Exhibit Hall II)☕
4:00-4:30pm Invited Talk: Christopher Tensmeyer - "Document Analysis and Document Generation: Two Sides of the Same Coin"
4:30-5:00pm Invited Talk: Naoto Inoue - "Graphic Design Generation Through Document Analysis"
5:00pm Closing Remarks 👋

Accepted Papers

(list in random order)

Full paper (in conference proceeding)

TAP-VL: Text Layout Aware Pretraining for Enriched Vision-Language Models

Jonathan Fhima, Elad Ben Avraham, Oren Nuriel, Yair Kittenplon, Roy Ganz, Aviad Aberdam, Ron Litman

A Survey on Reading Order, Table of Contents, and Structure Extraction in Document Analysis

Simone Giovannini

Structure-aware Contrastive Learning for Diagram Understanding of Multimodal Models

Hiroshi Sasaki

Improved Information Extraction by Leveraging Multi-Hypothesis OCR at Inference Time

Arthur Hemmer, Mickael Coustaty, Nicola Bartolo, Jean-marc Ogier

DocSemi: Efficient Document Layout Analysis with Guided Queries

Tahira Shehzadi, Ifza Ifza, Didier Stricker, Muhammad Zeshan Afzal

🏆DIVE-Doc: Downscaling foundational Image Visual Encoder into hierarchical architecture for DocVQA

Rayane Bencharef, Abderrahmane Rahiche, Mohamed Cheriet

PRISM: Pruning for Rank-adaptive Interpretable Segmentation Model with Application to Historical Document Multiband Images

Kilian Declercq, Abderrahmane Rahiche, Mohamed Cheriet

Deep Learning-Based Intrusion Detection Systems for Phishing Email Detection: A Short Survey

Axel De Nardin, Silvia Zottin, Claudio Piciarelli, Gian Luca Foresti

Scanned documents forensics: detecting inserted characters through noise and chromatic artifacts

Marina Gardella, Julieta Umpierrez, Antoine Tadros, Seginus Mowlavi, Natalia Bottaioli, Diego Belzarena, Gabriele Facciolo, Roy He, Jean-michel Morel, Rafael Grompone von Gioi

Text Image Generation for Low-Resource Languages with Dual Translation Learning

Chihiro Noguchi, Shun Fukuda, Shoichiro Mihara, Masao Yamanaka

ZOD : Zero-shot and Out-of-Distribution Detection Dataset for Document Images

Sheikh Talha Uddin, Sankalp Sinha, Shino Sam, Didier Stricker, Muhammad Zeshan Afzal

CTC Transcription Alignment of the Bullinger Letters: Automatic Improvement of Annotation Quality

Marco Peer, Anna Scius-Bertrand, Andreas Fischer

Describe Anything Model for Visual Question Answering on Text-rich Images

Yen-Linh Vu, Duong-Dinh-Thang, Truong-Binh Duong, Anh-Khoi Nguyen, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Jianhua Xing, Xingjian Li, Tianyang Wang, Ulas Bagci, Min Xu

ChemMiner: A Large Language Model Agent System for Chemical Literature Data Mining

Kexin Chen, Yuyang Du, Junyou Li, Hanqun Cao, Menghao Guo, Xilin Dang, Lanqing Li, Jiezhong Qiu, Guangyong Chen, Pheng-Ann Heng

Quo Vadis Handwritten Text Generation for Handwritten Text Recognition?

Vittorio Pippi, Konstantina Nikolaidou, Silvia Cascianelli, George Retsinas, Giorgos Sfikas, Rita Cucchiara, Marcus Liwicki

CoSMo: A Multimodal Transformer for Page Stream Segmentation in Comic Books

Marc Serra Ortega, Emanuele Vivoli, Artemis Llabrés, Dimosthenis Karatzas

Short Papers:

Towards Reliable and Interpretable Document Question Answering via VLMs

Alessio Chen, Simone Giovannini, Andrea Gemelli, Fabio Coppini, Simone Marinai

Download

Copy-move detection in scanned documents

Marina Gardella, Julieta Umpierrez, Pablo Muse, Rafael Grompone von Gioi, Jean-Michel Morel

Download

EmptyT: Empty Template Recovery from Handwritten-Filled Forms

Natalia Bottaioli, Yanhao Li, Gabriele Facciolo, Jean-Michel Morel

Download

Evaluating Medical School Personal Statements with SCRIBE: A Fine-Tuned Transformer System for Structured Feedback

Cole Krudwig, Faith Kurtyka, Sara Avila, Sean Dore, George (Guy) McHendry, Steven Fernandes

Download

Theatre Chapbooks At Scale: A Statistical Comparative Analysis of Typography

Seginus Mowlavi, Diego Belzarena, Paula Casariego Castinera, Alejandra Ulla Lorenzo, Gregory Randall, Jean-Michel Morel

Download

Demos:

Interactive Document Image Forensics in IPOL

Marina Gardella, Julieta Umpierrez, Antoine Tadros, Seginus Mowlavi, Natalia Bottaioli, Diego Belzarena, Gabriele Facciolo, Roy Y. He, Jean-Michel Morel, Rafael Grompone von Gioi

Evaluating Medical School Personal Statements with SCRIBE: A Fine-Tuned Transformer System for Structured Feedback

Cole Krudwig, Faith Kurtyka, Sara Avila, Sean Dore, George (Guy) McHendry, Steven Fernandes

Page updated

Google Sites

Report abuse