Data Release

Related Sites

Task Descriptions


Task 1 - Named entity recognition and normalization of disorders.

The clinical narrative is abundant in mentions of clinical conditions, anatomical sites, medications, and procedures, which is in stark contrast with the newswire domain where text is dominated by mentions of countries, locations and people. Many surface forms are representations of the same concept. Unlike the general domain, in biomedicine there are rich lexical and ontological resources that can be leveraged when building applications. The Unified Medical Language System (UMLS, https://uts.nlm.nih.gov/home.html) represents over 130 lexicons/thesauri with terms from a variety of languages. The UMLS Metathesaurus integrates resources used world-wide in clinical care, public health, and epidemiology, including SNOMED-CT, ICD9, and RxNORM. In addition, the UMLS also provides a semantic network in which every concept in the Metathesaurus is represented by its Concept Unique Identifier (CUI) and is semantically typed (Bodenreider and McCray, 2003).

Because the recognition and normalization of named entity mentions is a fundamental task, it will be the focus of Task 1. Task 1 includes the recognition of mentions of concepts that belong to the UMLS semantic group Disorders and the mapping each mention to a unique UMLS CUI. Here are a few examples—more are provided in the annotation guidelines and in the page on Datasets.

Examples of the Task 1 Annotations:

(1) The rhythm appears to be atrial fibrillation.
    “atrial fibrillation” is a mention of type Disorders with CUI C0004238 (UMLS preferred term is “atrial fibrillation”)
(2) The left atrium is moderately dilated.
    “left atrium.... dilated” is a mention of type Disorders with CUI C0344720 (UMLS preferred term is “left atrial dilatation”)
(3) 53 year old man s/p fall from ladder.
     “fall from ladder” is a mention of type Disorders with CUI C0337212 (UMLS preferred term is “accidental fall from ladder”)

Example (1) represents the easiest cases. Example (2) represents mentions that are disjoint. Example (3) is a synonym of the UMLS preferred term.

Task 1 consists of (a) discovering the mention boundaries and (b) mapping each mention to a UMLS CUI. The language is English. Normalization/mapping will be limited to UMLS CUIs of SNOMED codes. Participants are free to use any UMLS resources as well as other supplemental content such as WordNet, Wikipedia, etc. Use of annotations in addition to those provided in the shared task is allowed but will be evaluated separately.

Reference:

Bodenreider, O. and McCray, A. Exploring semantic groups through visual approaches. Journal of Biomedical Informatics, 2003. 36(2203): pp. 414-432. http://semanticnetwork.nlm.nih.gov/SemGroups/Papers/2003-medinfo-atm.pdf


Task 2 - Normalization of acronyms/abbreviations

Many of the terms found in clinical documents are acronyms or abbreviations that can be difficult for patients to understand. Task 2 focuses on mapping acronyms and abbreviations to UMLS CUIs, which provide an expansion and a definition of the term. We include in Task 2 acronyms/abbreviations of all semantic types but exclude mentions that occur in lists of medications or lab results. Not all acronyms and abbreviations map to a UMLS CUI and are assigned as “CUI-less” annotations. In Task 2, we will provide annotations of the acronyms/abbreviations, and participants will map them to UMLS CUIs. Here are a few examples—more are provided in the annotation guidelines and in the page on Datasets.

Examples of Task 2 Annotations:

(1) He was given Vanco.
    “Vanco” is a mention of type Acronym/Abbreviation with CUI C0042313 (UMLS preferred term is “Vancomycin”)
(2) Patient has breast ca.
    “ca” is a mention of type
Acronym/Abbreviation with CUI C0006826 (UMLS preferred term is “Malignant Neoplasms”)
(3) Mitral Valve: Trivial MR
"MR" is a mention of type Acronym/Abbreviation with CUI C0026266 (UMLS preferred term is "Mitral Valve Insufficiency")  

Task 2 consists of mapping pre-annotated 
acronym/abbreviation mention to UMLS CUIs. The language is English. Similar to Task 1, participants are free to use any UMLS resources as well as other supplemental content such as WordNet, Wikipedia, etc. Use of annotations in addition to those provided in the shared task is allowed but will be evaluated separately.

Task 3 - Information retrieval to address questions patients may have when reading clinical reports

Task 3 will be a standard TREC-style information retrieval (IR) task using (a) a 2012 crawl of approximately one million medical documents made available by the EU-FP7 Khresmoi project (http://www.khresmoi.eu/) in plain text form and (b) general public queries that individuals may realistically pose based on the content of their discharge summaries. Queries will be generated from discharge summaries used in Tasks 1 and 2. The goal of Task 3 is to retrieve the relevant documents for the user queries.

The task will operate by distributing the test collection (document set, sample development queries, and result set) to registered task participants. Participants will have one month to explore the collection and develop retrieval techniques, after which test queries for the task will be released. Post-submission relevance assessment will be conducted using a pool of the submitted runs. Result sets for the task and performance measures will be distributed to participants.

Browse tabs Datasets and Evaluation for details on the task.


Comments