CHiPSAL: Challenges in Processing South Asian Languages
Abu Dhabi, UAE (Virtual)
January 19, 2025
Co-located with the 31st International Conference on Computational Linguistics (COLING 2025)
Abu Dhabi, UAE (Virtual)
January 19, 2025
Co-located with the 31st International Conference on Computational Linguistics (COLING 2025)
CHiPSAL 2025, the First workshop on Challenges in Processing South Asian Languages (CHiPSAL), will be held as part of the 31st International Conference on Computational Linguistics (COLING 2025) in Abu Dhabi, UAE, from January 19-24, 2025. The workshop will be conducted in virtual mode.
The proceedings are available on ACL Anthology - https://aclanthology.org/volumes/2025.chipsal-1/
The guidelines for poster presenters and participants have been updated here.
CHiPSAL 2025 Detailed Schedule, including all oral presentations and posters, is now available here.
Oral presentations will be conducted via Zoom, and virtual poster sessions will take place on GatherTown. Please register for the workshop to receive the links.
Overview:
South Asia, consisting of Afghanistan, Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan, and Sri Lanka, is one of the most populous regions in the world, with approximately 1.97 billion people. Home to over 700 languages and around 25 major scripts, the region reflects a rich cultural and linguistic heritage. Additionally, more than 50 million South Asians live abroad. Despite this diversity, South Asian languages are underrepresented in language technology. Recent large language models (LLMs) include minimal data from this region, and the challenges in processing South Asian languages begin with encoding issues. While most scripts are encoded in Unicode, some applications may not render them correctly due to orthographic complexities, and language input remains a problem in the region. The linguistic intricacies of these languages, with multiple writing systems and long literary traditions, further complicate natural language processing tasks. Dialectal and cultural variations, as well as close language contact, add an extra level of complexity. Therefore, this workshop focuses on the challenges in processing South Asian languages, covering issues related to linguistic and cultural aspects, encoding and orthography, and resource constraints. By addressing these challenges, we aim to facilitate South Asian language processing with a focus on linguistic and cultural heritage.