Participants are provided with a claim and a pool of candidate evidence (manually curated and/or LLM-generated). The objective is to retrieve the most relevant evidence and determine whether the claim is supported or refuted based on the retrieved evidence. This setting isolates retrieval performance while still requiring evidence-grounded reasoning.
Participants are given only a claim and must retrieve evidence from open sources such as Wikipedia or web search results. Systems are expected to produce:
A structured and concise evidence summary (100-150 words, up to 200 in exceptional cases)
A veracity label (SUPPORTED /REFUTED)
A proper justification for supporting or refuting the claim (100-120 words)
This subtask reflects real-world fact-checking pipelines and emphasizes the integration of retrieval, reasoning, and generation, often implemented through retrieval-augmented generation (RAG) frameworks. It is to be noted that no direct LLM-generated outputs are acceptable. Participants should use any open-source data pool (dump) or web-search results to collect evidence and verify a claim.