FIRE-VATIKA

Shared Task

VATIKA: Varanasi Tourism in Question Answer System

17th Meeting of Forum for Information

Retrieval Evaluation (FIRE), 2025

17-20 Dec 2025

Overview

In India, Tourism plays a vital role in economic development, generating income, creating jobs, and supporting local businesses. It promotes cultural exchange, preserves India's heritage, and encourages infrastructure growth. By attracting visitors, tourism boosts regional pride and global awareness, making it a key driver of sustainable development and international cooperation in the field of travel and tourism. As one of the world’s oldest living cities, it attracts millions seeking spiritual and cultural enrichment. Varanasi is a cultural and spiritual hub renowned for its Bhakti-Bhaav (devotional ethos). Our proposed shared task on question answering system (models) aims to provide authentic information and cover various domains related to Varanasi (Kashi) that will help enhance tourism market for Varanasi, Uttar Pradesh. These models will provide accurate information related to the domains of Ganga Aarti, Cruise, Food Court, Public Toilet, Kund, Museum, Travel Agencies, Ashram, Temple and General queries in Indian language to the tourists. The main goal of the task is to provide a user-friendly Question Answer System to ensure the tourists, a smoother and more enriching experience and a hassle free trip to Varanasi.

In the field of NLP, Question Answering (QA) system, which is developed to automatically answer the queries of the users based on the database or a set of documents. It tries to provide specific answers to the posed questions in a natural language. The proposed track (shared task) entitled "VATIKA: Varanasi Tourism in Question Answer System Indian language" designed specifically for the Tourism domain. The VATIKA dataset is diverse in nature. It covers the major important aspects of the Varanasi tourism. It facilitates the tourist and provides a guide to travel across the Varanasi city. The system will address user queries related to tourist spots, cultural events, local traditions, services, and accommodations. The uniqueness of this track lies in its focus on a specific domain—tourism —and its multilingual aspect, supporting Hindi.

VATIKA Dataset

VATIKA: Varanasi Tourism in Question Answer System (Indian Language), a Hindi-language QA dataset specifically designed to support machine reading comprehension (MRC) and QA applications in the tourism domain. Centered on the culturally rich city of Varanasi, the dataset reflects realistic queries that travelers and pilgrims might ask regarding locations, logistics, services, and spiritual landmarks.

VATIKA is unique in that it spans 10 tourism-relevant domains, including Ganga Aarti, Cruise, Food Court, Public Toilet, Kund, Museum, General, Ashram, Temple and Travel. Each domain includes detailed paragraph-level Hindi contexts followed by multiple question-answer pairs, simulating real-world information-seeking behavior in natural language. The questions range from factual to navigational and experiential, enhancing coverage across diverse tourist concerns. The VATIKA dataset is written entirely in Hindi, using the Devanagari script, and serves as a valuable language resource for building and evaluating QA systems. It supports both open-domain and contextual MRC-style question answering.

Dataset Statistics

Click here to register for access to the VATIKA dataset

Data Format for Train, Validation, and Test Data-I

The dataset is provided in structured JSON format, organized by domain → context → QAs. Each QA pair includes a unique ID, the question in Hindi, and its corresponding answer:

{

"domains": [

{

"domain": "kund",

"contexts": [

{

"context": "मणिकर्णिका चक्र पुष्करणीय कुंड लाल बहादुर शास्त्री अंतरराष्ट्रीय ह...",

"qas": [

{

"id": "kund_1467",

"question": "मणिकर्णिका चक्र पुष्करणीय कुंड लाल बहादुर शास्त्री अंतरराष्ट्रीय हवाई अड्डे (वाराणसी) से कितनी दूर है?",

"answer": "मणिकर्णिका चक्र पुष्करणीय कुंड लाल बहादुर शास्त्री अंतरराष्ट्रीय हवाई अड्डे (वाराणसी) से 25.8 किलोमीटर दूर है।"

}

{

"id": "kund_1468",

"question": "मणिकर्णिका चक्र पुष्करणीय कुंड लाल बहादुर शास्त्री अंतरराष्ट्रीय हवाई अड्डे के पास से कैसे पहुँचा जा सकता है?",

"answer": "मणिकर्णिका चक्र पुष्करणीय कुंड लाल बहादुर शास्त्री अंतरराष्ट्रीय हवाई अड्डे से यह दूरी टैक्सी या अन्य निजी परिवहन के माध्यम से तय की जा सकती है।"

}

]

}

]

}

]

}

Data Format for Test Data-II

In Test Data-II, the answer field will be left blank as shown below. Participants are required to generate predictions for these questions and return the same JSON file with the answer field filled with their predicted answers.

{

"domains": [

{

"domain": "kund",

"contexts": [

{

"context": "पांडव कुंड में दर्शन करने के लिए कोई निर्धारित समय सीमा नहीं है। फिर भी, श्रद्धालुओं के लि...",

"qas": [

{

"id": "kund_1256",

"question": "क्या नारद कुंड में दर्शन के लिए सुबह और शाम का समय सर्वोत्तम माना जाता है?",

"answer": " "

}

]

}

]

}

]

}

Submission and Evaluation Details

Participants will be required to build a QA model capable of handling diverse linguistic structures, dialectal variations, and domain-specific terminology related to tourism. The participants have to register through the registration link to get the VATIKA dataset. The test data will be divided into 2 parts: Test Data-I and Test Data-II. Test Data-I will be provided to the participants at the initial stage. Test Data-II will be provided before the final submission. Participants are advised to use only open source Large Language Models (LLM'S) for developing the model. Usage of closed-source or paid LLMs (e.g., GPT-4, Claude, Gemini) will lead to disqualification for the task.

The QA developed model will be evaluated for its ability to accurately comprehend user questions, retrieve relevant information, and provide precise answers on test Test Data-II. Participants' submissions will be evaluated based on the F1 score, BLEU score, and ROUGE-L score.

Important Dates

15th May, 2025: Open track websites and release of training data

15th June, 2025 20th June, 2025: Test Data-I release

25th June, 2025 30th June, 2025: Test Data-II release

30th June, 2025 05th July, 2025: Run submission deadline

15th July, 2025 20th July, 2025: Track results declaration

30th August, 2025 07th September, 2025 : Working notes due

30th September, 2025: Camera-ready copies of working notes and overview paper due

17th December, 2025: FIRE Conference

Organizers

Name: Dr. Praveen Gatla

Designation: Assistant Professor

Department: Department of Linguistics, Faculty of Arts, Banaras Hindu University (BHU), Varanasi, Uttar Pradesh, INDIA.

Email Address: praveengatla@bhu.ac.in

Research Interests: Computational Linguistics, Paninian Grammar, Corpus Studies, Developing Treebanks for Indian Languages, Parallel Corpora, Translation Studies.

Name: Dr. Rajesh Kumar Mundotiya

Designation: Assistant Professor

Department: Department of Computer Science and Engineering, IIT Bhilai, Chhattisgarh, INDIA.

Email: rmundotiya@iitbhilai.ac.in

Research Interests: Machine Translation, Question-Answer System, Large Language Model, Dialogue Generation, Image Captioning

Co-organizers

Jyoti Kumari (her), Research Scholar, Department of Linguistics, Faculty of Arts, BHU

Anushka (her), Junior Research Fellow, Department of Linguistics, Faculty of Arts, BHU

Nabanita Sadhukhan (her), Junior Research Fellow, Department of Computer Science and Engineering, IIT Bhilai

firevatika2025@gmail.com

Page updated

Google Sites

Report abuse