The Abductive Event Reasoning (AER) task is framed as a multiple-choice question answering problem, aiming to evaluate large language models' ability to identify the most plausible direct cause of a real-world event based on textual evidence.
Data Format
Each instance consists of:
Event: A short description of an observed real-world event.
Context: A set of retrieved documents related to the event (also have some distractor docs).
Options (A–D): Four candidate explanations for the event, written as natural language sentences.
Among the four options:
One or more may be correct.
One option is always: “The information provided is insufficient to determine the cause.”
The model must output the correct option(s) (e.g., A,B) based on reasoning over the input context.
Evaluation Metric
Each instance is scored as follows:
✅ Full match with correct answers → 1 point
⚠️ Partial match → 0.5 point
❌ Wrong or invalid selection → 0 points
An Example
# question.jsonl
{
"topic_id": <topic_id>, # int
"question": <question>, # string
"option_A": <option_A>, # string
"option_B": <option_B>, # string
"option_C": <option_C>, # string
"option_D": <option_D>, # string
"answer": <answer> # string, e.g. "A,B"
}
# docs.jsonl
{
"topic_id": <topic_id>, # int
"docs": <docs> # list[Dict[string, string]]
}
The official task dataset(including sample data, training data, dev data and evaluation data) is hosted on GitHub at: https://github.com/sooo66/semeval2026-task12-dataset . Extensive documentation is provided in the repository to help participants get familiar with the data.