Subtask 1: Full-Report Compliance Matching
Objective: Automatically identify the relevant pages within a full ESG report that correspond to each SASB metric and verify whether the disclosures meet the specified category and unit of measure defined by the SASB guidelines.
Task Description: Each system is provided with a full ESG report (which may exceed 200 pages) and the complete list of SASB disclosure requirements. Each metric describes a specific type of expected disclosure (e.g., total greenhouse gas emissions, energy usage, governance-related discussions). The system must:
Locate the pages in the report that are relevant to each SASB metric;
Determine whether the content on those pages fulfills the disclosure requirement;
Check whether the disclosure uses the correct category (e.g., Quantitative, Discussion and Analysis) and unit of measure (e.g., metric tonnes CO₂e, percentage).
Challenges:
Handling long, complex documents with varied layouts.
Extracting information from both text and visual elements like charts and tables.
Dealing with multilingual content and differing ESG reporting practices across regions and industries.
Subtask 2: Single-Page Metric Verification
Objective: Given a single SASB metric and a single page from an ESG report, determine whether that page contains relevant information and if so, whether it complies with the specified category and unit of measure.
Task Description: Each system receives a single SASB metric and a corresponding page from an ESG report. The system must:
Decide whether the metric is addressed on the given page;
If it is, verify whether the information aligns with the correct category and unit of measure specified in the SASB standard.
Challenges:
Limited context: only one page is available.
Requires fine-grained understanding of local content.
Useful for evaluating the model’s precision and error patterns on a micro-level.
Both subtasks are treated as classification tasks. The primary evaluation metric is F1-score, which balances precision and recall. Additional metrics like accuracy, precision, and recall may also be reported to provide a more complete performance picture.