Ninth International Workshop on SCIentific DOCument Analysis (SCIDOCA 2025)

May 26-27, 2025 Osaka International Convention Center, Osaka, Japan and Online

with JSAI International Symposia on AI

SCIDOCA 2025 Shared Task

Important Dates

Registration

Program (updated on May 25)

May 26 Day 1

May 27 Day 2

Shared Task Technical Team

Aims and Scope

Recent proliferation of scientific papers and technical documents has become an obstacle to efficient information acquisition of new information in various fields. It is almost impossible for individual researchers to check and read all related documents. Even retrieving relevant documents is becoming harder and harder. This workshop gathers all the researchers and experts who are aiming at scientific document analysis from various perspectives, and invite technical paper presentations and system demonstrations that cover any aspects of scientific document analysis.

Topics

Relevant topics include, but are not limited to, the following:

text analysis
document structure analysis
logical structure analysis
figure and table analysis
citation analysis of scientific and technical documents
scientific information assimilation
summarization and visualization
knowledge discovery/mining from scientific papers and data
similar document retrieval
entity and relation linking between documents and knowledge base
survey generation
resources for scientific documents analysis
document understanding in general
NLP systems aiming for scientific documents including tagging, parsing, coreference, etc.

SCIDOCA 2025 Shared Task

The SCIDOCA 2025 Shared Task aims to push the boundaries of citation analysis through multiple subtasks, encouraging the development of innovative models for citation prediction, masked citation generation, and citation placement.

https://sites.google.com/view/scidoca/2025/shared-task

Important Dates

Call for Workshop Papers: Nov 8, 2024

Submission deadline: Jan 26 Feb 16 Mar 2, 2025 (FIRMED)

Notification: Mar 14, 2025

Camera-ready: Mar 25, 2025

Workshop days: May 26-27, 2025 (updated)

Registration

Registration link https://isai2025-ai-gakkai.peatix.com/

Program (updated on May 25)

May 26 Day 1

14:15 Workshop opening

14:20 Unsupervised Retrieval-Based Pipeline for Cancer Domain Question Answering

Trung Vo, An Trieu, Son Luu and Le-Minh Nguyen

14:50 Enhancing Entity Aware Machine Translation with Multi-task Learning

An Trieu, Phuong Nguyen and Minh Le Nguyen

15:20 Entity-Based Synthetic Data Generation for Named Entity Recognition in Low-resource Domains

An Dao, Hiroki Teranishi, Yuji Matsumoto and Akiko Aizawa

15:50 Zero-shot Entity Recognition for Polymer Biodegradability Information: GPT-4o on PolyBD

Shanshan Liu, Masashi Ishii and Yuji Matsumoto

coffee break

16:30 Automated Extraction of Polymer Property Information from Scientific Literature Using Advanced Event Argument Extraction Techniques

Thuy Phi and Yuji Matsumoto

17:00 AuthNet: A Framework for Research Expert Discovery and Network Visualization Based on Topic-Specific Queries

Dieu-Hien Nguyen, Nguyen-Khang Le and Le-Minh Nguyen

17:30 Analyzing Logical Fallacies in Large Language Models: A Study on Hallucination in Mathematical Reasoning

Hoang Anh Dang, Vu Tran and Le Minh Nguyen

18:00 Day 1 closing

May 27 Day 2

10:00 Keynote (general for all JSAI-isAI workshops)

Prof. Chenghua Lin The University of Manchester, Department of Computer Science

Chenghua Lin is a Full Professor and Chair in Natural Language Processing in the Department of Computer Science at The University of Manchester. His research focuses on integrating machine learning and NLP for language generation and understanding. He currently serves as the Chair of the ACL SIGGEN Board, a member of the IEEE Speech and Language Processing Technical Committee, and is a founding advisor of the Multimodal Art Projection community. He has received several prizes and awards for his research, including the CIKM Test-of-Time Award and the INLG Best Paper Runner-up Award. He has also held numerous program and chairing roles for *ACL conferences, including Documentation Chair for ACL’25, Publication Chair for ACL’23, Workshop Chair for AACL-IJCNLP’22, Program Chair for INLG’19, and Senior Area Chair for NAACL’25, IJCNLP-AACL’25, ACL’23, EACL’23, ACL’22, and EMNLP’20.

"On the Rigour of Scientific Writing and the Robustness of NLG Evaluation"

Scientific rigour hinges on both the clarity and credibility with which findings are communicated, and the robustness of evaluation methods. In this talk, I present two research efforts that address these challenges from a computational perspective—one focused on the rigour of scientific writing, and the other on the robust evaluation of natural language generation (NLG) systems. First, I introduce a bottom-up, data-driven framework to automatically identify and define rigour criteria and assess their relevance in scientific writing. Despite its fundamental importance, rigour remains underexplored from a computational standpoint, and there is limited analysis of whether existing criteria effectively measure it in practice. We demonstrate the effectiveness of our framework using datasets from two high-impact venues (i.e. ACL and ICLR) and analyse the linguistic patterns associated with scientific rigour. Second, evaluating NLG systems is inherently challenging due to the diversity of valid outputs. While human evaluation remains the gold standard, it suffers from inconsistencies, lack of standardisation, and demographic biases, limiting reproducibility. LLM-based evaluation offers a scalable alternative but is highly sensitive to prompt design, where small variations can lead to significant discrepancies. To address this, we propose an inversion learning method that learns effective reverse mappings from model outputs back to their input instructions, enabling the automatic generation of highly effective, model-specific evaluation prompts. Together, these works contribute towards more rigorous and trustworthy research.

11:00 lunch break

14:00 Keynote

Dr. Nguyen Tien Huy University of Science, Vietnam National University, Ho Chi Minh City

Dr. Nguyen Tien Huy is an AI Research Associate Manager at MoMo and a lecturer in the Department of Computer Science, Faculty of Information Technology, University of Science, VNU-HCM. He earned his Ph.D. in Information Science from the Japan Advanced Institute of Science and Technology (JAIST) in 2019. His research focuses on Natural Language Processing and Deep Learning, with particular interest in intelligent systems and multimodal data analysis. As an educator, Dr. Huy teaches courses in AI, smart data analytics, Large language model, and data science.

"Will I Be Replaced by AI?"

As artificial intelligence (AI) continues to evolve at an unprecedented pace, it is reshaping the global labor market and redefining the future of work. This talk, titled "Will I Be Replaced by AI?", explores the driving forces behind the AI boom, projections for its growth, and the essential skills expected by 2030. By examining macroeconomic and technological trends, the presentation highlights both the opportunities and challenges that AI integration brings. Emphasizing that the future is not about replacement but reinvention, the talk underscores the enduring value of creativity, critical thinking, and emotional intelligence. Attendees will gain practical insights into how to adapt, thrive, and collaborate effectively with intelligent systems in the coming decades.

coffee break

15:15 Enhancing Visual Question Answering with Generative AI: A Study on Synthetic Data Augmentation

Tung Le, Phu-Thinh Nguyen-Huynh and Huy Tien Nguyen

15:45 Shared Task Overview

An Dao, Vu Tran, Le Minh Nguyen, Yuji Matsumoto

16:15 Team LA at SCIDOCA shared task 2025: Citation Discovery via relation-based zero-shot retrieval

An Hoang Trieu, Long Hoang Nguyen and Minh Le Nguyen

16:45 Embedding-Based Retrieval Approaches for Automated Citation Prediction

Dat Le and Son Nguyen

17:15 An Attention-Driven Framework for Citation Discovery and Recommendation

Long Do Ngoc, Hieu Phi Minh, Toan Tran Tien and Anh Phan Viet

17:45 - 18:00 Workshop closing

Submissions

There are two classes of submissions:

Long paper on original and completed work, including concrete evaluation and analysis wherever appropriate; and
Short paper on a small, focused contribution, work in progress, a negative result, or an opinion piece.

The page limits are up to 14 pages including references for the longer papers, and up to 7 pages including references for the short papers. (Reviewers will be told that there is no penalty for writing a shorter submission.)

All submissions should be written in English, formatted according to the Springer Verlag LNCS style in a pdf form, which can be obtained from here. The paper should be anonymized. If you use a word file, please follow the instruction of the format, and then convert it into a pdf form and submit it at the paper submission page.

For both classes, in addition to the original unpublished work, we also accept the papers that have already been published or presented in other venues. This submission should also be anonymized, and will be reviewed by the program committee.

You can submit your paper at https://easychair.org/conferences/?conf=scidoca2025 . If you cannot submit a paper by EasyChair System by some trouble, please send email to "nguyenml[at]jaist.ac.jp"

If a paper is accepted, at least one author of the paper must register the workshop and present it. Please register the workshop at registration page.

Workshop Chairs

Le-Minh Nguyen, Japan Advanced Institute of Science and Technology

Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project (Advisor)

Vu Tran, Japan Advanced Institute of Science and Technology (Co-Chair)

Shared Task Technical Team

Vu Tran, Japan Advanced Institute of Science and Technology

An Dao, RIKEN Center for Advanced Intelligence Project

Program Committee Members

Le-Minh Nguyen, Japan Advanced Institute of Science and Technology

Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project

Vu Tran, Japan Advanced Institute of Science and Technology

Noriki Nishida, RIKEN Center for Advanced Intelligence Project

Yusuke Miyao, The University of Tokyo

Yoshinobu Kano, Shizuoka University

Akiko Aizawa, National Institute of Informatics

Ken Satoh, National Institute of Informatics and Sokendai

Junichiro Mori, The University of Tokyo

Kentaro Inui, Tohoku University

Nguyen Ha Thanh, National Institute of Informatics

Nguyen Minh Phuong, Japan Advanced Institute of Science and Technology

An Dao, RIKEN Center for Advanced Intelligence Project

May Myo Zin, Center for Juris-informatics