SpRadIE

SpRadIE

Information Extraction from Spanish radiology reports

Within the Information Extraction Task of CLEF eHealth 2021, we present SpRadIE, an information extraction challenge from Spanish clinical text. The task targets Named Entity Recognition in the domain of radiology reports, more concretely, ultrasounds. To our knowledge, this is the first public challenge for Spanish dealing with actual medical records, the working notes produced by physicians during their clinical practice.

The task targets the detection of seven different entities as well as hedge cues. Targeted entities include Anatomical Entities but also Findings, describing a pathological or abnormal event, and indicators of probability or future outcomes. Identifying such entities allows to automate accurate information retrieval and extraction from the huge repositories of radiology reports. Entities may also be used as labels to train automated image processing systems, to assist radiologists in the identification of abnormalities or for image-to-text systems.

The annotated corpus consists of a total of 513 ultrasonography reports, provided by a major pediatric hospital in Buenos Aires, with over 17,000 annotated named entities.

Challenges

SpRadIE offers multiple challenges to motivate participants to find creative solutions, such as integrating background knowledge from additional resources, or the usage of other additional (also cross-lingual) datasets to supplement the given training dataset. Some resources are listed in the Data section. Some of the challenges are:

Domain-specific language: Radiology reports tend to be written in haste, with mistakes and high variability: typos, inconsistencies and in a telegraphic style. Moreover, resources for clinical text are scarce, particularly for languages other than English.

Semantic Split: Training, development and test sets cover different semantic fields, i.e. heart- or liver-related reports, etc., so that various topics and their corresponding entities that occur in the test dataset have not been previously seen in the training dataset.

Small data: To approach realistic deploy conditions, only a small amount of annotated reports will be available during training, and the rest will be used for evaluation.

Complex entities: The linguistic form of entities presents some particular difficulties: lengthier entities with inner structure, embedded entities and discontinuities. Examples can be found in the Task section.


To participate

1. register at CLEF eHealth. Task 1 -Information Extraction from Noisy Text

2. fill this form.