Shared Task
VATIKA: Varanasi Tourism in Question Answer System
17th Meeting of Forum for Information
Retrieval Evaluation (FIRE), 2025
17-20 Dec 2025
Shared Task
VATIKA: Varanasi Tourism in Question Answer System
17th Meeting of Forum for Information
Retrieval Evaluation (FIRE), 2025
17-20 Dec 2025
Overview
In India, Tourism plays a vital role in economic development, generating income, creating jobs, and supporting local businesses. It promotes cultural exchange, preserves India's heritage, and encourages infrastructure growth. By attracting visitors, tourism boosts regional pride and global awareness, making it a key driver of sustainable development and international cooperation in the field of travel and tourism. As one of the world’s oldest living cities, it attracts millions seeking spiritual and cultural enrichment. Varanasi is a cultural and spiritual hub renowned for its Bhakti-Bhaav (devotional ethos). Our proposed shared task on question answering system (models) aims to provide authentic information and cover various domains related to Varanasi (Kashi) that will help enhance tourism market for Varanasi, Uttar Pradesh. These models will provide accurate information related to the domains of Ganga Aarti, Cruise, Food Court, Public Toilet, Kund, Museum, Travel Agencies, Ashram, Temple and General queries in Indian language to the tourists. The main goal of the task is to provide a user-friendly Question Answer System to ensure the tourists, a smoother and more enriching experience and a hassle free trip to Varanasi.
In the field of NLP, Question Answering (QA) system, which is developed to automatically answer the queries of the users based on the database or a set of documents. It tries to provide specific answers to the posed questions in a natural language. The proposed track (shared task) entitled "VATIKA: Varanasi Tourism in Question Answer System Indian language" designed specifically for the Tourism domain. The VATIKA dataset is diverse in nature. It covers the major important aspects of the Varanasi tourism. It facilitates the tourist and provides a guide to travel across the Varanasi city. The system will address user queries related to tourist spots, cultural events, local traditions, services, and accommodations. The uniqueness of this track lies in its focus on a specific domain—tourism —and its multilingual aspect, supporting Hindi.
VATIKA Dataset
VATIKA: Varanasi Tourism in Question Answer System (Indian Language), a Hindi-language QA dataset specifically designed to support machine reading comprehension (MRC) and QA applications in the tourism domain. Centered on the culturally rich city of Varanasi, the dataset reflects realistic queries that travelers and pilgrims might ask regarding locations, logistics, services, and spiritual landmarks.
VATIKA is unique in that it spans 10 tourism-relevant domains, including Ganga Aarti, Cruise, Food Court, Public Toilet, Kund, Museum, General, Ashram, Temple and Travel. Each domain includes detailed paragraph-level Hindi contexts followed by multiple question-answer pairs, simulating real-world information-seeking behavior in natural language. The questions range from factual to navigational and experiential, enhancing coverage across diverse tourist concerns. The VATIKA dataset is written entirely in Hindi, using the Devanagari script, and serves as a valuable language resource for building and evaluating QA systems. It supports both open-domain and contextual MRC-style question answering.
Dataset Statistics
Click here to register for access to the VATIKA dataset
The dataset is provided in structured JSON format, organized by domain → context → QAs. Each QA pair includes a unique ID, the question in Hindi, and its corresponding answer:
{
"domains": [
{
"domain": "kund",
"contexts": [
{
"context": "मणिकर्णिका चक्र पुष्करणीय कुंड लाल बहादुर शास्त्री अंतरराष्ट्रीय ह...",
"qas": [
{
"id": "kund_1467",
"question": "मणिकर्णिका चक्र पुष्करणीय कुंड लाल बहादुर शास्त्री अंतरराष्ट्रीय हवाई अड्डे (वाराणसी) से कितनी दूर है?",
"answer": "मणिकर्णिका चक्र पुष्करणीय कुंड लाल बहादुर शास्त्री अंतरराष्ट्रीय हवाई अड्डे (वाराणसी) से 25.8 किलोमीटर दूर है।"
}
{
"id": "kund_1468",
"question": "मणिकर्णिका चक्र पुष्करणीय कुंड लाल बहादुर शास्त्री अंतरराष्ट्रीय हवाई अड्डे के पास से कैसे पहुँचा जा सकता है?",
"answer": "मणिकर्णिका चक्र पुष्करणीय कुंड लाल बहादुर शास्त्री अंतरराष्ट्रीय हवाई अड्डे से यह दूरी टैक्सी या अन्य निजी परिवहन के माध्यम से तय की जा सकती है।"
}
]
}
]
}
]
}
In Test Data-II, the answer field will be left blank as shown below. Participants are required to generate predictions for these questions and return the same JSON file with the answer field filled with their predicted answers.
{
"domains": [
{
"domain": "kund",
"contexts": [
{
"context": "पांडव कुंड में दर्शन करने के लिए कोई निर्धारित समय सीमा नहीं है। फिर भी, श्रद्धालुओं के लि...",
"qas": [
{
"id": "kund_1256",
"question": "क्या नारद कुंड में दर्शन के लिए सुबह और शाम का समय सर्वोत्तम माना जाता है?",
"answer": " "
}
]
}
]
}
]
}
Submission and Evaluation Details
Participants will be required to build a QA model capable of handling diverse linguistic structures, dialectal variations, and domain-specific terminology related to tourism. The participants have to register through the registration link to get the VATIKA dataset. The test data will be divided into 2 parts: Test Data-I and Test Data-II. Test Data-I will be provided to the participants at the initial stage. Test Data-II will be provided before the final submission. Participants are advised to use only open source Large Language Models (LLM'S) for developing the model. Usage of closed-source or paid LLMs (e.g., GPT-4, Claude, Gemini) will lead to disqualification for the task.
The QA developed model will be evaluated for its ability to accurately comprehend user questions, retrieve relevant information, and provide precise answers on test Test Data-II. Participants' submissions will be evaluated based on the F1 score, BLEU score, and ROUGE-L score.
Important Dates
15th May, 2025: Open track websites and release of training data
15th June, 2025: Test Data-I release
25th June, 2025: Test Data-II release
30th June, 2025: Run submission deadline
15th July, 2025: Track results declaration
30th August, 2025: Working notes due
30th September, 2025: Camera-ready copies of working notes and overview paper due
17th December, 2025: FIRE Conference
Organizers
Name: Dr. Praveen Gatla
Designation: Assistant Professor
Department: Department of Linguistics, Faculty of Arts, Banaras Hindu University (BHU), Varanasi, Uttar Pradesh, INDIA.
Email Address: praveengatla@bhu.ac.in
Research Interests: Computational Linguistics, Paninian Grammar, Corpus Studies, Developing Treebanks for Indian Languages, Parallel Corpora, Translation Studies.
Name: Dr. Rajesh Kumar Mundotiya
Designation: Assistant Professor
Department: Department of Computer Science and Engineering, IIT Bhilai, Chhattisgarh, INDIA.
Email: rmundotiya@iitbhilai.ac.in
Research Interests: Machine Translation, Question-Answer System, Large Language Model, Dialogue Generation, Image Captioning
Co-organizers
Jyoti Kumari (her), Research Scholar, Department of Linguistics, Faculty of Arts, BHU
Anushka (her), Junior Research Fellow, Department of Linguistics, Faculty of Arts, BHU
Nabanita Sadhukhan (her), Junior Research Fellow, Department of Computer Science and Engineering, IIT Bhilai
Contact Us
firevatika2025@gmail.com