Chemotherapy Treatment Timelines Extraction
from the Clinical Narrative:
Data
The data are provided by University of Pittsburgh/UMPC and constitute a limited data set. Specifically, the notes contain dates, but are otherwise de-identified. All types of notes from the patient’s EHR, e.g. radiology reports, pathology notes, clinical notes, oncology notes, discharge summaries, progress reports, etc., are included.
Note the data will be distributed through a Data Use Agreement (DUA) with University Pittsburgh. Please visit the Registration and Data Access page page for instructions of how to register and request a DUA.
Background
An EVENT is anything that is relevant on the clinical timeline. Each EVENT has a temporal relation with the document creation time (DocTimeRel), one of BEFORE, BEFORE-OVERLAP, OVERLAP, or AFTER. Temporal expressions (TIMEX3) are discrete references to time. Temporal relations (TLINKs) link two EVENTs or an EVENT and TIMEX3. The set of temporal relations is CONTAINS, CONTAINS-SUBEVENT, BEFORE, OVERLAP, BEGINS-ON, ENDS-ON, NOTED-ON. The event that CONTAINS another event is referred to as a narrative container. CONTAINS-1 is the inverse of CONTAINS.
THYME annotation guidelines for pairwise temporal relations (2014) with definitions, also in Styler et al, 2014. The addition of NOTED-ON temporal relation and the refinement of 2014 annotation guidelines are described in Lin, Wright-Bettner et al, 2020. We used these definitions to build our system for extracting the pairwise temporal relations from which we derive the patient-level timelines. See Resources -> Organizers' System.
Using pairwise temporal relations to derive the chemotherapy timelines is one way of casting the task. There are other methods and the participants are free to use any of them.
Data Description
Two types of data is offered to the participants:
Unlabeled data: Unlabeled EHR notes for ~62,000 patients with breast cancer and ovarian cancer and ~16,000 patients with melanoma.
Labeled data:
Gold annotations – timelines at the patient level (training and development sets): Timelines annotated by experts are provided; distribution across train, development is in Table 1 (numbers indicate number of patients). The EHR notes are provided.
Gold annotations – instance level (training and development sets): Gold annotations for EVENTs, TIMEX3s and pairwise temporal relations between an EVENT and a TIMEX3 are provided. These annotations are the instance-level evidence that is one way of enabling the derivation of the patient-level timelines. The gold annotations are in the Anafora xml format (https://github.com/weitechen/anafora ). We will provide a description of the format. The matching EHR notes are provided. Table 2 shows the distribution for EVENTs, TIMEX3s, and TLINKs across train and development splits.
Please, note we do not provide the distributions on the test data that in Table 1 and Table 2. They are comparable to the Dev Set distributions.
TEST SET:
You can find the test data for each task in Globus, Collections subtask1 and subtask2 respectively.
The test collections (subtask1 and subtask2) are accessible to all registered participants with approved DUAs
To upload your system output, each team has been assigned its own collection. Please, do NOT upload system output in the test collections. See instructions on page "Submissions of Test Output"