CHiPSAL: Challenges in Processing South Asian Languages
Palma, Mallorca (Spain)
11-16 May 2026
Co-located with the 15th biennial Language Resources and Evaluation Conference (LREC 2026)
Palma, Mallorca (Spain)
11-16 May 2026
Co-located with the 15th biennial Language Resources and Evaluation Conference (LREC 2026)
CHiPSAL 2026, the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL), will be held as part of the 15th biennial Language Resources and Evaluation Conference (LREC 2026) in Palma, Mallorca (Spain), from 11-16 May 2026. The workshop will be conducted in hybrid mode.
Overview:
South Asia, consisting of Afghanistan, Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan, and Sri Lanka, is one of the most populous regions in the world, with more than 2 billion people. Home to over 700 languages and around 25 major scripts, the region reflects a rich cultural and linguistic heritage. Additionally, more than 50 million South Asians live abroad. Despite this diversity, South Asian languages are underrepresented in language technology. Recent large language models (LLMs) include minimal data from this region, and the challenges in processing South Asian languages begin with encoding issues. While most scripts are encoded in Unicode, some applications may not render them correctly due to orthographic complexities, and language input remains a problem in the region. The linguistic intricacies of these languages, with multiple writing systems and long literary traditions, further complicate natural language processing tasks. Dialectal and cultural variations, as well as close language contact, add an extra level of complexity. The first edition of CHiPSAL was co-located with COLING 2025 (Sarveswaran, Thapa, Shams, Vaidya, & Bal, 2025; Thapa et al., 2025) and was well received by both academia and industry. It saw strong participation from the region and served as a much-needed platform for exchanging challenges, insights, and solutions related to South Asian languages. Building on this momentum, the workshop continues to focus on the challenges in processing South Asian languages, covering issues related to linguistic and cultural aspects, encoding and orthography, and resource constraints. By addressing these challenges, we aim to facilitate South Asian language processing with a strong emphasis on preserving and promoting linguistic and cultural heritage.