Data Sets

View the knowledge based dataset here

Knowledge Base Dataset

Evaluation DataSet

Overview

We developed a structured evaluation dataset to assess the performance of our AI-powered VQA system across varying levels of complexity. This dataset consists of carefully curated question-answer pairs that test the system's capabilities in information retrieval, computational analysis, and predictive reasoning.

Difficulty Classification

The dataset categorizes questions into three distinct levels based on cognitive complexity and processing requirements:

Easy Questions
- - Direct information retrieval from the dataset
  - Example: "List all days in June 2012 when fires occurred in the Bejaia region"
  - Tests basic data access and presentation capabilities

Medium Questions
- - Require data processing or simple calculations
  - Example: "What was the range of ISI values during July fires in Bejaia?"
  - Evaluates analytical processing and intermediate reasoning

Hard Questions
- - Predictive analysis based on multiple parameters
  - Example: "Determine whether a wildfire may occur given these indices..."
  - Assesses advanced reasoning and decision-making capabilities

Development Methodology

The evaluation dataset was carefully constructed using questions derived from actual dataset values and real-world wildfire scenarios, ensuring relevance and practical applicability. Each answer pair was rigorously validated against ground truth data to maintain accuracy and reliability. The dataset places strong emphasis on covering diverse aspects of wildfire monitoring, including temporal patterns to analyze fire occurrences over time, weather correlations to understand environmental influences, risk assessment for proactive management, and predictive modeling to forecast potential fire events.

Purpose and Utility

This tiered evaluation approach enables us to:

Benchmark system performance across cognitive levels
Identify strengths and weaknesses in different reasoning tasks
Validate the system's ability to handle progressively complex queries
Ensure balanced assessment of both factual recall and predictive capabilities

Page updated

Report abuse