Evaluation

Evaluation Scheme

The evaluation scheme is designed separately to assess retrieval effectiveness and the accuracy of veracity predictions.

Subtask 1 Evaluation:
- Evidence retrieval will be evaluated using Success@3 and nDCG@3. The Success@3 measures whether at least one relevant piece of evidence appears among the top k retrieved results. The nDCG@3 measures how well the retrieval system orders the item, prioritizing highly relevant results at the top.
- Veracity prediction will be evaluated using macro F1-score over the "SUPPORTS" and "REFUTES" classes.

Subtask 2 Evaluation:
- The primary metric will be the macro F1-score for veracity prediction.
- Generated evidence summaries and justifications will be manually evaluated.

Page updated

Google Sites

Report abuse