MedNLPDoc = Medical Natural Language Processing for Clinical Document

Since more electronic records are now used at medical scenes, the importance of technical development for analyzing such electronically provided information has been increasing significantly. Our goal is to promote and support generating the practical tools and systems applicable for the medical industry. In NTCIR-10, we have evaluated a very basic task, named entity recognition (NER) of clinical records. In NTCIR-11, we have evaluated the term normalization technology. Now in this NTCIR-12, we are to challenge a new, more advanced and practical task that is to guess the disease name (represented by the International Codes for Diseases; ICD) from provided medical records. The developed system for this task may be used to directly support an actual application for daily clinical services and in many areas of clinical study.

近年,電子カルテの普及につれ,医療分野での情報処理の重要性が増しています.これをうけて,NTCIR MedNLPタスクシリーズは,医療をサポートする実用的なシステムを生み出すことを目指しています.今回は,日本語のテキスト診療データに適切な診断名とICDコードを付与するのがコアタスクです.複数の病名コードを持ちうる診療データを扱うこのタスクは,文章に対するマルチ・ラベリング問題と位置づけられることから,今回のタスクをMedNLPDocと名付けました.今回のタスクで得られる成果は日常診療および研究分野での実際のアプリケーションにほぼそのまま適用できます.


April 15, 2016: Online Registration is now available./NIIカンファレンス オンライン登録の受付を始めました


March 1, 2016: Draft participant paper submission due/ドラフト参加者論文提出締切

Feb. 5, 2016: Run submission was closed. Draft task overview is released.

Two Types of Tasks

Participants are supposed to extract information from medical reports written by physicians. We plan to hold following two types of tasks:


<data id="168" sex="MALE" age="26">




<text type="現病歴">


1時間経過しても症状回復しなかったため、救急外来を受診し胸部X線上で自然気胸を認め、chest tubeを挿入し、手術目的で入院した。


<text type="手術">

thoracoscopic bullectomyを施行。


water seal testの結果leak(-)。

第5肋間よりchest tube挿入。


<text type="入院後経過">


術後経過良好で、術後3日目にchest tube抜去。



<icd code="J931"></icd>

<icd code="Z720"></icd>


Task 1: Phenotyping task

The participants are expected to give a standard disease name(s) from given medical records. This task corresponds to the phenotyping task in the medical research.

The red color are ICD code(s) (output) that participants should estimate.


Task 2: Creative task

We are welcoming your creative ideas that will help us to utilize our resulting products in the real world. Especially, a new task plan or annotation scheme for the next MedNLPDOC-2 is desired.


Expected results

Through MedNLP-1 and MedNLP-2, the core elemental technologies have been evaluated. This task is advanced and practical task, which requires all the previous technologies such as NER, a term level coding task, event time recognition, and so on. With this proposed tasks, we will be able to advance to develop state-of-the-art automatic diagnosis applications. These technologies will be combined to solve the proposed NTCIR-12 task, which is close to the practical application. Therefore, we hope the products of this task can be soon applicable in the real clinical site.



• Jan 21, 2016: Distribution Test-set/テストセット配布

• Feb 1, 2016: Early draft task overview release/ドラフト概要論文公開

• Feb 4, 2016: Run submission due/ラン投稿締切

• March 1, 2016: Draft participant paper submission due/ドラフト参加者論文提出締切

• June 7-10, 2016: NTCIR-12 Conference & EVIA 2016 in NII, Tokyo/NTCIR-12 カンファレンス & EVIA 2016 (NII, 東京)

Possibility to collaborate with participants from foreign countries

Estimating the disease name from medical record is gathering worldwide attention, because it can be used to identify patients who are affected with a certain disease for recruiting them in clinical studies/trials targeting this disease. When performing a global clinical study/trial that is recently on the increase, this kind of patient identifying technology must be equally implemented in each language. Therefore, it should be better to try this task also in other languages than Japanese. As several shared tasks that target medical records in English have been held recent years (e.g., i2b2, TREC Medical Records Track, and CLEFeHealth), we are planning to collaborate with one of them. We think we can perform it to specify the target disease, such as diabetes, hypertension, and breast cancer.

We also welcome other languages if proper foreign collaborators or local organizers will be found.

診療データから診断名を推定する技術は,臨床研究においてはある特定の病気にかかっている患者を特定できることから,世界的に注目を集める技術です.世界的な臨床研究を行う場合は,こういった患者特定技術はどの言語においても実装不可欠です.このことから,今回のタスクを日本語以外においても挑戦することが望まれます.英語の診療データを対象としたシェアドタスクはここ数年行われており(e.g., i2b2, TREC Medical Records Track, and CLEFeHealth),私たちはそれらとコラボレートすることも視野に入れています.例えば,糖尿病や高血圧,乳癌などの疾病を対象に今回の技術を応用することができると考えています.もしそのような国外のコラボレーターや国内のオーガナイザーがいれば,日本語圏以外からの参加も歓迎いたします.

Paper submission guideline for Participants

The paper should sufficiently describe your system architecture and resource usage in detail. We will ask you to improve your manuscript when the description is insufficient to reproduce your system design technically. Although this is not a peer-review, a submission without sufficient description might be rejected to publish in the proceedings and presentation by the organizers.




ARAMAKI Eiji*, Ph.D. (Kyoto University, JST PRESTO) /荒牧英治(奈良先端技術大学院大学、JST さきがけ)

MORITA Mizuki*, Ph.D. (The University of Okayama)/森田瑞樹(岡山大学)

KANO Yoshinobu*, Ph.D. (Shizuoka University, JST PRESTO)/ 狩野芳伸(静岡大学、JST さきがけ)

OHKUMA Tomoko*, Ph.D. (Fuji Xerox Co. Ltd.)/大熊智子(富士ゼロックス)


MASUICHI Hiroshi†, Ph.D. (Fuji Xerox Co. Ltd.)/ 増市博 (富士ゼロックス)


Fuji Xerox Co. Ltd./富士ゼロックス株式会社

IR-Advanced Linguistic Technologies Inc./株式会社 アイアール・アルト