View the knowledge based dataset here
Overview
We developed a structured evaluation dataset to assess the performance of our AI-powered VQA system across varying levels of complexity. This dataset consists of carefully curated question-answer pairs that test the system's capabilities in information retrieval, computational analysis, and predictive reasoning.
Difficulty Classification
The dataset categorizes questions into three distinct levels based on cognitive complexity and processing requirements:
Easy Questions
Direct information retrieval from the dataset
Example: "List all days in June 2012 when fires occurred in the Bejaia region"
Tests basic data access and presentation capabilities
Medium Questions
Require data processing or simple calculations
Example: "What was the range of ISI values during July fires in Bejaia?"
Evaluates analytical processing and intermediate reasoning
Hard Questions
Predictive analysis based on multiple parameters
Example: "Determine whether a wildfire may occur given these indices..."
Assesses advanced reasoning and decision-making capabilities
The evaluation dataset was carefully constructed using questions derived from actual dataset values and real-world wildfire scenarios, ensuring relevance and practical applicability. Each answer pair was rigorously validated against ground truth data to maintain accuracy and reliability. The dataset places strong emphasis on covering diverse aspects of wildfire monitoring, including temporal patterns to analyze fire occurrences over time, weather correlations to understand environmental influences, risk assessment for proactive management, and predictive modeling to forecast potential fire events.
This tiered evaluation approach enables us to:
Benchmark system performance across cognitive levels
Identify strengths and weaknesses in different reasoning tasks
Validate the system's ability to handle progressively complex queries
Ensure balanced assessment of both factual recall and predictive capabilities