Details & Materials


Content details

  1. Introduction to NLP and information extraction, as applied to clinical text [30 minutes]: Introduces participants to the characteristics of information extraction and its application to clinical text. The history of information extraction, its related domains, levels of analysis, and main applications will be discussed. The characteristics of clinical text and its differences with biomedical text (i.e. scientific publications) and general English text will also be discussed and demonstrated. Slides
  2. Existing NLP tools for clinical information extraction [40 minutes]: Includes presentation of several existing NLP and IE applications for clinical (and biomedical) text, as well as useful resources for clinical IE. These resources include NLP modules for specific tasks, terminologies, and web-based applications. Slides
  3. Existing corpora and annotations for clinical information extraction [40 minutes]: Includes a presentation of currently available clinical text corpora and associated annotations. Text annotation exercises are part of this section (CLAMP annotation video). Slides
  4. Main methods for clinical information extraction [60 minutes]: Includes presentation and explanation of the main methods and algorithms used for information extraction, from pattern matching and rule- based methods, to statistical and machine learning methods. The analysis of the local context of extracted information, as well as the extraction of relations between information elements, are also presented. Rules-writing and machine learning exercises conclude this section (CLAMP for smoking status video). Slides
  5. Clinical information extraction evaluation [40 minutes]: Introduces participants to the creation of reference standards for IE and their use for accuracy evaluation, along with an explanation of common metrics used for clinical IE evaluation. Also includes exercises visualizing the system output and reference standard, comparing them, and computing accuracy metrics (CLAMP NER pipeline and evaluation video). Slides

Prior knowledge required by attendees

Ability to install and run software on their personal computer. Understanding of biomedical terminological resources (e.g., UMLS Metathesaurus) and text processing will be helpful. No software programming knowledge will be required. Familiarity with the Java programming language is helpful, but not required. Code samples and step-by-step instructions will be provided when needed.

Tutorial material

Slides: An introduction to NLP and information extraction, to clinical text characteristics, and existing NLP resources for clinical text processing will be presented with slides. They can be downloaded (see above).

Clinical text corpus: A collection of synthetic clinical notes from MTSamples will be made available to registered participants, along with annotations.

NLP tools: Several NLP tools will be used for the hands-on exercises during the tutorial. The main one used for exercises will be CLAMP. Preference was given to open source tools.

For active participation in exercises, participants should have a laptop computer. We will have flash drives to distribute the software.

Handout: Will be distributed at the tutorial and be made available to registered participants. So avoid printing large quantities of paper, it combines all slides presented during the tutorial with references about the NLP resources presented and used for the hands-on activities, and all are available through links above.