Inquiry about the Test Dataset: 24856504@konyang.ac.kr
Description:
Participants receive real radiology reports and patient records. The AI model must generate a “Causal Explanation” for each finding (i.e., why a particular imaging feature is concerning).
Task 1 Dataset:
Input (Text)
MIMIC CXR Reports (MIMIC report)
(Option, Image) MIMIC Chest X-ray image
(Option, Additional data available)
Output (Text)
Causal exploration report
Task definition
The objective of this task is to develop a model that learns to map a set of input data—consisting of radiology reports, optionally accompanied by chest X-ray images—to a corresponding set of output data that includes a causality exploration section. This section would not typically be described in the initial radiology report but has been recovered and verified by radiology experts based on a structured diagnosis confirmation checklist.
The input data for this task is sourced from the MIMIC database, a large, publicly available database of healthcare information. Participants must individually acquire the necessary licensing and permissions to access MIMIC data. The task organizers will provide the access method to the relevant data for participants who hold a valid MIMIC license, enabling them to retrieve and utilize the input data in line with MIMIC’s licensing requirements.
Participants will use the provided training set, which includes paired examples of input data (radiology report and optional X-ray image) and output data (causality exploration section), to train a learning module. This module should capture the underlying patterns and infer causal information that experts derive from both the report content and the diagnosis confirmation process.
Participants are required to build a running module (also referred to as the inference module) that can take any new input (a radiology report with or without an X-ray image) and generate the corresponding causality exploration section based on the learned transformation. This module will be deployed and tested on our evaluation server via API to assess its accuracy and effectiveness on testing inputs that were not included in the training set.
Output
The output of Task 1 is a causality exploration report. This report should provide a structured analysis of the radiology findings, highlighting potential causative relationships that could lead to a better understanding of the patient's condition. The report should reflect the diagnostic reasoning process by documenting how various symptoms and findings may be interlinked. For example, a finding of "pleural effusion" may be linked causally to "heart failure" if observed in the patient's medical history.
The report must begin with the fixed heading "Causal Exploration:" followed by the causality analysis text that reflects the diagnostic flow and reasoning. This structured format is required for consistency. The output format should clearly delineate identified causal links and any inferred reasoning steps that mimic a radiologist’s analytical process.
Description:
Participants receive a set of multiple-choice questions. Each question presents a diagnostic statement (e.g., “The patient shows a small nodule in the left lower lobe that has grown since prior imaging. Why is this suspicious for malignancy?”) and four candidate explanations. The AI model must select the correct explanation.
Input
The input data for Task 2 consists of radiologists’ responses to a structured questionnaire related to the assessment of chest X-ray images. This questionnaire includes a series of ordered questions designed to capture essential diagnostic information, structured as follows:
A1. First Impressions: Initial observations based on the X-ray, identifying general findings.
A2. Anatomical Location Identification: The anatomical region(s) of interest within the chest X-ray image where abnormalities may be present.
A3. Thoracic Spine Level Localization: Specification of the location in relation to the thoracic spine, aiding in precise abnormality localization.
A4. Final Impression: A revised conclusion based on the first impressions and location information, confirming or revising initial findings.
Training set's output
It is a report generated based on 28 structured checklists targeting specific abnormalities and their locations. These questions validate whether detected abnormalities substantiate the final impression as agreed upon by radiologists. The answers for the checklists are also collected by radiologists’ crowdsourcing work.
Output
The output is a report that encapsulates the causality within the radiologists' diagnostic reasoning process, focusing on the causative relationships within the diagnostic flow. This report should interpret and connect the various data points (A1-A4) in a way that mirrors the diagnostic thought process, revealing the causal relationships embedded in the medical observations. The report must begin with the fixed heading "Causal Exploration:" followed by the causality analysis text that reflects the diagnostic flow and reasoning. This structured format is required for consistency.
Process
Utilize Diagnostic Flow Data: Use the diagnostic flow data (A1-A4) to reconstruct the reasoning path of a radiologist, simulating the process they might follow when examining similar cases.
Generate Report Using Custom Model: Develop your own method to integrate A1 through A4 into a coherent diagnostic report.
Format the Report: Structure the causality analysis into a clear format. Create a section titled "Causal Exploration" where you will output the analyzed causality based on the diagnostic flow data. The report must begin with the fixed heading "Causal Exploration:" followed by the causality analysis text. This structure is mandatory to ensure consistency across submissions. This "Causal Exploration" section should include all identified causal links and inferred reasoning derived from the input data (A1-A4). Submit this "Causal Exploration" section, not the full report.
Validation and Case Matching: Match each report with the ground-truth data associated with the 'Causal section' to validate the accuracy and completeness of your causality reasoning.
Evaluation Criteria :
Grammatical Correctness: Is the explanation written fluently and accurately?
Domain Relevance: Does the explanation correctly reference clinical or anatomical knowledge?
Depth of Causal Explanation: Does it clearly articulate “why” the finding is significant?
Scoring:
Human Expert Score (1–5): Assigned by board-certified radiologists.
Automatic Scores:
GPT-White (contextual similarity to expert explanations)
GPT-Black (contextual diversity penalty)