Hidden-RAD2 consists of two main subtasks. Participants can choose to participate in one or both tasks.
Background: Radiologists synthesize multiple findings to reach a final impression, but not all of the underlying causal reasoning is explicitly written in their reports. This task aims to have an AI infer these "hidden" causal relationships and articulate them in an explicit explanation.
Input:
A radiology report from the MIMIC-CXR and IU-Xray dataset (including findings and impression sections)
(Optional) The corresponding chest X-ray image
(Optional) Any other knowledge resourcesÂ
Output: A causally-grounded explanation report that connects the findings to each impression. The report must logically describe why a specific diagnosis was made.
Background: While Large Language Models (LLMs) generate fluent text, they can produce "hallucinations" that are disconnected from facts. In the medical field, such errors can be critical. This task is designed to evaluate an AI's ability to self-verify the reliability of its generated text and correct errors. This "critical self-review" capability is essential for ensuring the safety and transparency of AI systems.
Input:
The original report and image (optional)
An AI-generated explanation that contains seeded errors.
Output:
Error Detection: Identify the location and type of errors within the explanation (e.g., flawed causality, factual inconsistency).
Error Correction: Correct the identified errors with accurate information.
Confidence Score: Provide a confidence score (from 0 to 1) for the overall correctness of the explanation.