Tenth International Workshop on SCIentific DOCument Analysis (SCIDOCA 2026)

June 8-9, 2026 G MESSE GUNMA (GUNMA Convention Center), Gunma, Japan and Online

associated with JSAI International Symposia on AI

Aims and Scope

Recent proliferation of scientific papers and technical documents has become an obstacle to efficient information acquisition of new information in various fields. It is almost impossible for individual researchers to check and read all related documents. Even retrieving relevant documents is becoming harder and harder. This workshop gathers all the researchers and experts who are aiming at scientific document analysis from various perspectives, and invite technical paper presentations and system demonstrations that cover any aspects of scientific document analysis.

This workshop is associated with JSAI International Symposia on AI, an event that hosts multiple international workshops together and is held in conjunction with JSAI annual conference.

Topics

Relevant topics include, but are not limited to, the following:

text analysis
document structure analysis
logical structure analysis
figure and table analysis
citation analysis of scientific and technical documents
scientific information assimilation
summarization and visualization
knowledge discovery/mining from scientific papers and data
similar document retrieval
entity and relation linking between documents and knowledge base
survey generation
resources for scientific documents analysis
document understanding in general
NLP systems aiming for scientific documents including tagging, parsing, coreference, etc.

Important Dates

Submission deadline: February 14, 2026 March 7, 2026 (extended)

Notification: March 18, 2026

Camera-ready: March 29, 2026

Workshop days: June 8-9, 2026

Registration

Registration link https://isai2026-ai-gakkai.peatix.com/

More information regarding registration can be found here https://www.ai-gakkai.or.jp/isai/registration-2026.

Program

June 8 Day 1

14:25 Workshop opening

14:30 Invited talk

Prof. Fei Cheng Kyoto University, Japan

"Building Japanese Medical LLMs: Domain Pretraining, Structured Summarization, and Knowledge-enhanced RAG"

Abstract: Medical LLMs require not only strong linguistic capabilities but also reliable adaptation to domain-specific knowledge, clinical document structures, and evidence-grounded reasoning. This talk presents our recent efforts toward building Japanese medical LLMs from these three complementary perspectives. We first share the practical insights into how medical corpus design for LLM training, particularly the balance between Japanese and multilingual medical data, affects the performance of downstream medical tasks. We then present an LLM-based causal tree extraction framework, designed to help clinicians quickly grasp complex clinical narratives by shaping salient clinical information into concise, tree-structured summaries. We further explore the knowledge-enhanced RAG approaches that leverage a curated medical knowledge base to improve factual consistency and clinical reliability of reasoning. In summary, the talk highlights key challenges in developing Japanese medical LLMs, including limited Japanese corpora, the need for expert-crafted tasks, and the demand for clinically grounded reasoning.

Bio: Fei Cheng is currently a Program-specific Senior Lecturer/Junior Associate Professor at Kyoto University. He received his Ph.D. in Engineering from the Nara Institute of Science and Technology in 2018. His research interests include information extraction, mathematical reasoning, large language models, and a broad range of natural language processing topics. He has made substantial contributions to the development of the domestic Japanese large language model LLM-jp and the medicine-specialized SIP model.

coffee break

15:45 Journal-Level Citation Impact of Articles with Dataset Links in Abstracts Identified Using a Generative AI Ensemble

Hiroyuki Tsunoda, Yuan Sun, Masaki Nishizawa, Xiaomin Liu and Kou Amano

16:15 An Approach for Improving Entity-Aware Machine Translation via Reinforcement Learning

An Trieu, Vu Tran and Le-Minh Nguyen

16:45 CiteData: A Large-Scale Dataset for Citation Discovery, Prediction, and Placement

An Dao, An Trieu, Vu Tran, Le Minh Nguyen, Akiko Aizawa and Yuji Matsumoto

17:15 Day 1 closing

June 9 Day 2

10:00 Invited talk

Dr. Van-Khanh Tran GenAI Center, FPT Smart Cloud, and Thai Nguyen University of Information and Communication Technology, Vietnam

"From Personalized Tutors to Pedagogical Classrooms: Scaffolded Multi-Agent LLMs for Collaborative Learning"

Abstract: Large Language Models are rapidly reshaping AI for Education, evolving from one-to-one tutoring assistants toward multi-agent systems that simulate entire classrooms. Yet effective collaborative learning needs more than dynamic interaction - it requires a well-structured pedagogical plan. In this talk, I share lessons from building Vietnamese AI tutoring systems, from large-scale Vietnamese LLMs to TutorAI for high-school Math, and then introduce SAGE (Scaffolded Agent-Guided Education), our AAAI 2026 framework. SAGE adopts a two-phase compositional design: an offline planning team - Planner, Evaluator, Optimizer, and Analyst - produces an Optimized Pedagogical Scenario that then configures real-time classmate agents through a proactive, self-selecting turn-taking mechanism grounded in Pólya’s problem-solving model. Across simulations and a human-in-the-loop study with Vietnamese 12th-grade students, SAGE achieves a 72.13% win rate over a next-speaker baseline, demonstrates strong role adherence, and produces a measurable scaffolding effect aligned with Vygotsky’s “I do / We do / You do” gradual-release model.

Bio: Van-Khanh Tran is an AI Research Scientist of the Foundation Models Team at the GenAI Center, FPT Smart Cloud, and Deputy Director of the Institute of Artificial Intelligence (IAI-ICTU) at Thai Nguyen University of Information and Communication Technology, Vietnam. He received his Ph.D. in Natural Language Processing from the Japan Advanced Institute of Science and Technology (JAIST) in 2018, with a thesis on deep learning for natural language generation in spoken dialogue systems. He was previously an Associate Research Fellow at Deakin University’s Applied Artificial Intelligence Institute in Australia, and an NLP Researcher at VinBigdata, where he contributed to Vivi, the virtual assistant deployed on VinFast electric vehicles. At FPT, he has led the development of Vietnamese Large Language Models from 7B to 70B parameters trained on over 80B tokens, and architected the legal AI knowledge base and reasoning LLM for Vietnam’s Ministry of Justice. His current research focuses on LLMs, multiagent systems, and pedagogical reasoning, with applications across education, healthcare, and legal AI. He has published at AAAI, COLING, CoNLL, SIGDIAL, and Computer Speech & Language, and received the Best Student Paper Award at KSE 2017.

11:00 lunch break

14:00 VietPS-Hallu: A Vietnamese Dataset for Hallucination Detection in Large Language Models within the Public Services

Dinh Bao Bui, Tien Nhat Nguyen, Tung Le and Huy Tien Nguyen

14:30 HoAstBench:A Method for evaluating LLMsin Smart Homes

Dong Peizhe and Vu Tran

15:00 Toward Efficient Entity-Focus RAG for Biomedical Question Answering

Vu Tran, Trung Vo and Le-Minh Nguyen

15:30 Advanced Legal Case Retrieval: Evaluating Generative LLM-Based Feature Extraction and Hybrid Reranking

Merouane Taleb, Amine-Samy Hedroug, Vu Tran and Minh Le Nguyen

coffee break

16:15 Organization Session

17:15 Workshop closing

Submissions

There are two classes of submissions:

Long paper on original and completed work, including concrete evaluation and analysis wherever appropriate; and
Short paper on a small, focused contribution, work in progress, a negative result, or an opinion piece.

The page limits are up to 14 pages including references for the longer papers, and up to 7 pages including references for the short papers. (Reviewers will be told that there is no penalty for writing a shorter submission.)

All submissions should be written in English, formatted according to the Springer Verlag LNCS style in a pdf form, which can be obtained from https://www.springer.com/gp/computer-science/lncs/conference-proceedings-guidelines. The paper should be anonymized. If you use a word file, please follow the instruction of the format, and then convert it into a pdf form and submit it at the paper submission page.

For both classes, in addition to the original unpublished work, we also accept the papers that have already been published or presented in other venues. This submission should also be anonymized, and will be reviewed by the program committee.

You can submit your paper at https://easychair.org/conferences/?conf=scidoca2026 . If you cannot submit a paper by EasyChair System by some trouble, please send email to "nguyenml[at]jaist.ac.jp"

If a paper is accepted, at least one author of the paper must register the workshop and present it. Please register the workshop at registration page.

Workshop Chairs

Le-Minh Nguyen, Japan Advanced Institute of Science and Technology

Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project (Advisor)

Vu Tran, Japan Advanced Institute of Science and Technology (Co-Chair)

Program Committee Members

Le-Minh Nguyen, Japan Advanced Institute of Science and Technology

Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project

Vu Tran, Japan Advanced Institute of Science and Technology

Noriki Nishida, RIKEN Center for Advanced Intelligence Project

Yusuke Miyao, The University of Tokyo

Yoshinobu Kano, Shizuoka University

Akiko Aizawa, National Institute of Informatics

Ken Satoh, Center for Juris-Informatics, ROIS

Junichiro Mori, The University of Tokyo

Kentaro Inui, Tohoku University

Nguyen Ha Thanh, National Institute of Informatics

Nguyen Minh Phuong, Japan Advanced Institute of Science and Technology

An Dao, RIKEN Center for Advanced Intelligence Project

May Myo Zin, Center for Juris-informatics

Danilo Carvalho, University of Manchester

Hai-Long Trieu, University of Cambridge