Abductive Event Reasoning

Task & Data

Task Definition

The Abductive Event Reasoning (AER) task is framed as a multiple-choice question answering problem, aiming to evaluate large language models' ability to identify the most plausible direct cause of a real-world event based on textual evidence.

Data Format

Each instance consists of:

Event: A short description of an observed real-world event.
Context: A set of retrieved documents related to the event (also have some distractor docs).
Options (A–D): Four candidate explanations for the event, written as natural language sentences.

Among the four options:

One or more may be correct.
One option is always: “The information provided is insufficient to determine the cause.”

The model must output the correct option(s) (e.g., A,B) based on reasoning over the input context.

Evaluation Metric

Each instance is scored as follows:

✅ Full match with correct answers → 1 point
⚠️ Partial match → 0.5 point
❌ Wrong or invalid selection → 0 points

An Example

# question.jsonl

{

"topic_id": <topic_id>, # int

"question": <question>, # string

"option_A": <option_A>, # string

"option_B": <option_B>, # string

"option_C": <option_C>, # string

"option_D": <option_D>, # string

"answer": <answer> # string, e.g. "A,B"

}

# docs.jsonl

{

"topic_id": <topic_id>, # int

"docs": <docs> # list[Dict[string, string]]

}

Dataset

The official task dataset(including sample data, training data, dev data and evaluation data) is hosted on GitHub at: https://github.com/sooo66/semeval2026-task12-dataset . Extensive documentation is provided in the repository to help participants get familiar with the data.

Page updated

Google Sites

Report abuse