Ninth International Workshop on SCIentific DOCument Analysis (SCIDOCA 2025)
May 26-27, 2025 Osaka International Convention Center, Osaka, Japan and Online
May 26-27, 2025 Osaka International Convention Center, Osaka, Japan and Online
Recent proliferation of scientific papers and technical documents has become an obstacle to efficient information acquisition of new information in various fields. It is almost impossible for individual researchers to check and read all related documents. Even retrieving relevant documents is becoming harder and harder. This workshop gathers all the researchers and experts who are aiming at scientific document analysis from various perspectives, and invite technical paper presentations and system demonstrations that cover any aspects of scientific document analysis.
Relevant topics include, but are not limited to, the following:
text analysis
document structure analysis
logical structure analysis
figure and table analysis
citation analysis of scientific and technical documents
scientific information assimilation
summarization and visualization
knowledge discovery/mining from scientific papers and data
similar document retrieval
entity and relation linking between documents and knowledge base
survey generation
resources for scientific documents analysis
document understanding in general
NLP systems aiming for scientific documents including tagging, parsing, coreference, etc.
The SCIDOCA 2025 Shared Task aims to push the boundaries of citation analysis through multiple subtasks, encouraging the development of innovative models for citation prediction, masked citation generation, and citation placement.
Call for Workshop Papers: Nov 8, 2024
Submission deadline: Jan 26 Feb 16 Mar 2, 2025 (FIRMED)
Notification: Mar 14, 2025
Camera-ready: Mar 25, 2025
Workshop days: May 26-27, 2025 (updated)
Registration link https://isai2025-ai-gakkai.peatix.com/
14:15 Workshop opening
14:20 Unsupervised Retrieval-Based Pipeline for Cancer Domain Question Answering
Trung Vo, An Trieu, Son Luu and Le-Minh Nguyen
14:50 Enhancing Entity Aware Machine Translation with Multi-task Learning
An Trieu, Phuong Nguyen and Minh Le Nguyen
15:20 Entity-Based Synthetic Data Generation for Named Entity Recognition in Low-resource Domains
An Dao, Hiroki Teranishi, Yuji Matsumoto and Akiko Aizawa
15:50 Zero-shot Entity Recognition for Polymer Biodegradability Information: GPT-4o on PolyBD
Shanshan Liu, Masashi Ishii and Yuji Matsumoto
coffee break
16:30 Automated Extraction of Polymer Property Information from Scientific Literature Using Advanced Event Argument Extraction Techniques
Thuy Phi and Yuji Matsumoto
17:00 AuthNet: A Framework for Research Expert Discovery and Network Visualization Based on Topic-Specific Queries
Dieu-Hien Nguyen, Nguyen-Khang Le and Le-Minh Nguyen
17:30 Analyzing Logical Fallacies in Large Language Models: A Study on Hallucination in Mathematical Reasoning
Hoang Anh Dang, Vu Tran and Le Minh Nguyen
18:00 Day 1 closing
10:00 Keynote (general for all JSAI-isAI workshops)
Prof. Chenghua Lin The University of Manchester, Department of Computer Science
Chenghua Lin is a Full Professor and Chair in Natural Language Processing in the Department of Computer Science at The University of Manchester. His research focuses on integrating machine learning and NLP for language generation and understanding. He currently serves as the Chair of the ACL SIGGEN Board, a member of the IEEE Speech and Language Processing Technical Committee, and is a founding advisor of the Multimodal Art Projection community. He has received several prizes and awards for his research, including the CIKM Test-of-Time Award and the INLG Best Paper Runner-up Award. He has also held numerous program and chairing roles for *ACL conferences, including Documentation Chair for ACL’25, Publication Chair for ACL’23, Workshop Chair for AACL-IJCNLP’22, Program Chair for INLG’19, and Senior Area Chair for NAACL’25, IJCNLP-AACL’25, ACL’23, EACL’23, ACL’22, and EMNLP’20.
"On the Rigour of Scientific Writing and the Robustness of NLG Evaluation"
Scientific rigour hinges on both the clarity and credibility with which findings are communicated, and the robustness of evaluation methods. In this talk, I present two research efforts that address these challenges from a computational perspective—one focused on the rigour of scientific writing, and the other on the robust evaluation of natural language generation (NLG) systems. First, I introduce a bottom-up, data-driven framework to automatically identify and define rigour criteria and assess their relevance in scientific writing. Despite its fundamental importance, rigour remains underexplored from a computational standpoint, and there is limited analysis of whether existing criteria effectively measure it in practice. We demonstrate the effectiveness of our framework using datasets from two high-impact venues (i.e. ACL and ICLR) and analyse the linguistic patterns associated with scientific rigour. Second, evaluating NLG systems is inherently challenging due to the diversity of valid outputs. While human evaluation remains the gold standard, it suffers from inconsistencies, lack of standardisation, and demographic biases, limiting reproducibility. LLM-based evaluation offers a scalable alternative but is highly sensitive to prompt design, where small variations can lead to significant discrepancies. To address this, we propose an inversion learning method that learns effective reverse mappings from model outputs back to their input instructions, enabling the automatic generation of highly effective, model-specific evaluation prompts. Together, these works contribute towards more rigorous and trustworthy research.
11:00 lunch break
14:00 Keynote
Dr. Nguyen Tien Huy University of Science, Vietnam National University, Ho Chi Minh City
Dr. Nguyen Tien Huy is an AI Research Associate Manager at MoMo and a lecturer in the Department of Computer Science, Faculty of Information Technology, University of Science, VNU-HCM. He earned his Ph.D. in Information Science from the Japan Advanced Institute of Science and Technology (JAIST) in 2019. His research focuses on Natural Language Processing and Deep Learning, with particular interest in intelligent systems and multimodal data analysis. As an educator, Dr. Huy teaches courses in AI, smart data analytics, Large language model, and data science.
"Will I Be Replaced by AI?"
As artificial intelligence (AI) continues to evolve at an unprecedented pace, it is reshaping the global labor market and redefining the future of work. This talk, titled "Will I Be Replaced by AI?", explores the driving forces behind the AI boom, projections for its growth, and the essential skills expected by 2030. By examining macroeconomic and technological trends, the presentation highlights both the opportunities and challenges that AI integration brings. Emphasizing that the future is not about replacement but reinvention, the talk underscores the enduring value of creativity, critical thinking, and emotional intelligence. Attendees will gain practical insights into how to adapt, thrive, and collaborate effectively with intelligent systems in the coming decades.
coffee break
15:15 Enhancing Visual Question Answering with Generative AI: A Study on Synthetic Data Augmentation
Tung Le, Phu-Thinh Nguyen-Huynh and Huy Tien Nguyen
15:45 Shared Task Overview
An Dao, Vu Tran, Le Minh Nguyen, Yuji Matsumoto
16:15 Team LA at SCIDOCA shared task 2025: Citation Discovery via relation-based zero-shot retrieval
An Hoang Trieu, Long Hoang Nguyen and Minh Le Nguyen
16:45 Embedding-Based Retrieval Approaches for Automated Citation Prediction
Dat Le and Son Nguyen
17:15 An Attention-Driven Framework for Citation Discovery and Recommendation
Long Do Ngoc, Hieu Phi Minh, Toan Tran Tien and Anh Phan Viet
17:45 - 18:00 Workshop closing
There are two classes of submissions:
Long paper on original and completed work, including concrete evaluation and analysis wherever appropriate; and
Short paper on a small, focused contribution, work in progress, a negative result, or an opinion piece.
The page limits are up to 14 pages including references for the longer papers, and up to 7 pages including references for the short papers. (Reviewers will be told that there is no penalty for writing a shorter submission.)
All submissions should be written in English, formatted according to the Springer Verlag LNCS style in a pdf form, which can be obtained from here. The paper should be anonymized. If you use a word file, please follow the instruction of the format, and then convert it into a pdf form and submit it at the paper submission page.
For both classes, in addition to the original unpublished work, we also accept the papers that have already been published or presented in other venues. This submission should also be anonymized, and will be reviewed by the program committee.
You can submit your paper at https://easychair.org/conferences/?conf=scidoca2025 . If you cannot submit a paper by EasyChair System by some trouble, please send email to "nguyenml[at]jaist.ac.jp"
If a paper is accepted, at least one author of the paper must register the workshop and present it. Please register the workshop at registration page.
Le-Minh Nguyen, Japan Advanced Institute of Science and Technology
Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project (Advisor)
Vu Tran, Japan Advanced Institute of Science and Technology (Co-Chair)
Vu Tran, Japan Advanced Institute of Science and Technology
An Dao, RIKEN Center for Advanced Intelligence Project
Le-Minh Nguyen, Japan Advanced Institute of Science and Technology
Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project
Vu Tran, Japan Advanced Institute of Science and Technology
Noriki Nishida, RIKEN Center for Advanced Intelligence Project
Yusuke Miyao, The University of Tokyo
Yoshinobu Kano, Shizuoka University
Akiko Aizawa, National Institute of Informatics
Ken Satoh, National Institute of Informatics and Sokendai
Junichiro Mori, The University of Tokyo
Kentaro Inui, Tohoku University
Nguyen Ha Thanh, National Institute of Informatics
Nguyen Minh Phuong, Japan Advanced Institute of Science and Technology
An Dao, RIKEN Center for Advanced Intelligence Project
May Myo Zin, Center for Juris-informatics