Tenth International Workshop on SCIentific DOCument Analysis (SCIDOCA 2026)
June 8-9, 2026 G MESSE GUNMA (GUNMA Convention Center), Gunma, Japan and Online
associated with JSAI International Symposia on AI
June 8-9, 2026 G MESSE GUNMA (GUNMA Convention Center), Gunma, Japan and Online
associated with JSAI International Symposia on AI
Recent proliferation of scientific papers and technical documents has become an obstacle to efficient information acquisition of new information in various fields. It is almost impossible for individual researchers to check and read all related documents. Even retrieving relevant documents is becoming harder and harder. This workshop gathers all the researchers and experts who are aiming at scientific document analysis from various perspectives, and invite technical paper presentations and system demonstrations that cover any aspects of scientific document analysis.
This workshop is associated with JSAI International Symposia on AI, an event that hosts multiple international workshops together and is held in conjunction with JSAI annual conference.
Relevant topics include, but are not limited to, the following:
text analysis
document structure analysis
logical structure analysis
figure and table analysis
citation analysis of scientific and technical documents
scientific information assimilation
summarization and visualization
knowledge discovery/mining from scientific papers and data
similar document retrieval
entity and relation linking between documents and knowledge base
survey generation
resources for scientific documents analysis
document understanding in general
NLP systems aiming for scientific documents including tagging, parsing, coreference, etc.
Submission deadline: February 14, 2026 March 7, 2026 (extended)
Notification: March 18, 2026
Camera-ready: March 29, 2026
Workshop days: June 8-9, 2026
Registration link https://isai2026-ai-gakkai.peatix.com/
More information regarding registration can be found here https://www.ai-gakkai.or.jp/isai/registration-2026.
14:25 Workshop opening
14:30 Invited talk
Prof. Fei Cheng Kyoto University, Japan
"Building Japanese Medical LLMs: Domain Pretraining, Structured Summarization, and Knowledge-enhanced RAG"
Abstract: Medical LLMs require not only strong linguistic capabilities but also reliable adaptation to domain-specific knowledge, clinical document structures, and evidence-grounded reasoning. This talk presents our recent efforts toward building Japanese medical LLMs from these three complementary perspectives. We first share the practical insights into how medical corpus design for LLM training, particularly the balance between Japanese and multilingual medical data, affects the performance of downstream medical tasks. We then present an LLM-based causal tree extraction framework, designed to help clinicians quickly grasp complex clinical narratives by shaping salient clinical information into concise, tree-structured summaries. We further explore the knowledge-enhanced RAG approaches that leverage a curated medical knowledge base to improve factual consistency and clinical reliability of reasoning. In summary, the talk highlights key challenges in developing Japanese medical LLMs, including limited Japanese corpora, the need for expert-crafted tasks, and the demand for clinically grounded reasoning.
Bio: Fei Cheng is currently a Program-specific Senior Lecturer/Junior Associate Professor at Kyoto University. He received his Ph.D. in Engineering from the Nara Institute of Science and Technology in 2018. His research interests include information extraction, mathematical reasoning, large language models, and a broad range of natural language processing topics. He has made substantial contributions to the development of the domestic Japanese large language model LLM-jp and the medicine-specialized SIP model.
coffee break
15:45 Journal-Level Citation Impact of Articles with Dataset Links in Abstracts Identified Using a Generative AI Ensemble
Hiroyuki Tsunoda, Yuan Sun, Masaki Nishizawa, Xiaomin Liu and Kou Amano
16:15 An Approach for Improving Entity-Aware Machine Translation via Reinforcement Learning
An Trieu, Vu Tran and Le-Minh Nguyen
16:45 Advanced Legal Case Retrieval: Evaluating Generative LLM-Based Feature Extraction and Hybrid Reranking
Merouane Taleb, Amine-Samy Hedroug, Vu Tran and Minh Le Nguyen
17:15 Day 1 closing
10:00 Invited talk
Dr. Van-Khanh Tran GenAI Center, FPT Smart Cloud, and Thai Nguyen University of Information and Communication Technology, Vietnam
"From Personalized Tutors to Pedagogical Classrooms: Scaffolded Multi-Agent LLMs for Collaborative Learning"
Abstract: Large Language Models are rapidly reshaping AI for Education, evolving from one-to-one tutoring assistants toward multi-agent systems that simulate entire classrooms. Yet effective collaborative learning needs more than dynamic interaction - it requires a well-structured pedagogical plan. In this talk, I share lessons from building Vietnamese AI tutoring systems, from large-scale Vietnamese LLMs to TutorAI for high-school Math, and then introduce SAGE (Scaffolded Agent-Guided Education), our AAAI 2026 framework. SAGE adopts a two-phase compositional design: an offline planning team - Planner, Evaluator, Optimizer, and Analyst - produces an Optimized Pedagogical Scenario that then configures real-time classmate agents through a proactive, self-selecting turn-taking mechanism grounded in Pólya’s problem-solving model. Across simulations and a human-in-the-loop study with Vietnamese 12th-grade students, SAGE achieves a 72.13% win rate over a next-speaker baseline, demonstrates strong role adherence, and produces a measurable scaffolding effect aligned with Vygotsky’s “I do / We do / You do” gradual-release model.
Bio: Van-Khanh Tran is an AI Research Scientist of the Foundation Models Team at the GenAI Center, FPT Smart Cloud, and Deputy Director of the Institute of Artificial Intelligence (IAI-ICTU) at Thai Nguyen University of Information and Communication Technology, Vietnam. He received his Ph.D. in Natural Language Processing from the Japan Advanced Institute of Science and Technology (JAIST) in 2018, with a thesis on deep learning for natural language generation in spoken dialogue systems. He was previously an Associate Research Fellow at Deakin University’s Applied Artificial Intelligence Institute in Australia, and an NLP Researcher at VinBigdata, where he contributed to Vivi, the virtual assistant deployed on VinFast electric vehicles. At FPT, he has led the development of Vietnamese Large Language Models from 7B to 70B parameters trained on over 80B tokens, and architected the legal AI knowledge base and reasoning LLM for Vietnam’s Ministry of Justice. His current research focuses on LLMs, multiagent systems, and pedagogical reasoning, with applications across education, healthcare, and legal AI. He has published at AAAI, COLING, CoNLL, SIGDIAL, and Computer Speech & Language, and received the Best Student Paper Award at KSE 2017.
11:00 lunch break
14:00 VietPS-Hallu: A Vietnamese Dataset for Hallucination Detection in Large Language Models within the Public Services
Dinh Bao Bui, Tien Nhat Nguyen, Tung Le and Huy Tien Nguyen
14:30 CiteData: A Large-Scale Dataset for Citation Discovery, Prediction, and Placement
An Dao, An Trieu, Vu Tran, Le Minh Nguyen, Akiko Aizawa and Yuji Matsumoto
15:00 Toward Efficient Entity-Focus RAG for Biomedical Question Answering
Vu Tran, Trung Vo and Le-Minh Nguyen
15:30 HoAstBench:A Method for evaluating LLMsin Smart Homes
Dong Peizhe and Vu Tran
coffee break
16:15 Organization Session
17:15 Workshop closing
There are two classes of submissions:
Long paper on original and completed work, including concrete evaluation and analysis wherever appropriate; and
Short paper on a small, focused contribution, work in progress, a negative result, or an opinion piece.
The page limits are up to 14 pages including references for the longer papers, and up to 7 pages including references for the short papers. (Reviewers will be told that there is no penalty for writing a shorter submission.)
All submissions should be written in English, formatted according to the Springer Verlag LNCS style in a pdf form, which can be obtained from https://www.springer.com/gp/computer-science/lncs/conference-proceedings-guidelines. The paper should be anonymized. If you use a word file, please follow the instruction of the format, and then convert it into a pdf form and submit it at the paper submission page.
For both classes, in addition to the original unpublished work, we also accept the papers that have already been published or presented in other venues. This submission should also be anonymized, and will be reviewed by the program committee.
You can submit your paper at https://easychair.org/conferences/?conf=scidoca2026 . If you cannot submit a paper by EasyChair System by some trouble, please send email to "nguyenml[at]jaist.ac.jp"
If a paper is accepted, at least one author of the paper must register the workshop and present it. Please register the workshop at registration page.
Le-Minh Nguyen, Japan Advanced Institute of Science and Technology
Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project (Advisor)
Vu Tran, Japan Advanced Institute of Science and Technology (Co-Chair)
Le-Minh Nguyen, Japan Advanced Institute of Science and Technology
Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project
Vu Tran, Japan Advanced Institute of Science and Technology
Noriki Nishida, RIKEN Center for Advanced Intelligence Project
Yusuke Miyao, The University of Tokyo
Yoshinobu Kano, Shizuoka University
Akiko Aizawa, National Institute of Informatics
Ken Satoh, Center for Juris-Informatics, ROIS
Junichiro Mori, The University of Tokyo
Kentaro Inui, Tohoku University
Nguyen Ha Thanh, National Institute of Informatics
Nguyen Minh Phuong, Japan Advanced Institute of Science and Technology
An Dao, RIKEN Center for Advanced Intelligence Project
May Myo Zin, Center for Juris-informatics
Danilo Carvalho, University of Manchester
Hai-Long Trieu, University of Cambridge