Monojit Choudhury,
Professor of Natural Language Processing,
Mohamed bin Zayed University of Artificial Intelligence,
Abu Dhabi, United Arab Emirates
Abstract:
The Curious Case of Honorifics in South Asian Languages and Their Treatment in LLMs
Many South Asian languages exhibit honorific distinctions in second- and third-person pronouns, along with corresponding agreement patterns. For example, Hindi encodes three levels of formality - formal (aap), neutral (tum), and familiar (tu). The appropriate choice among these forms is governed by a complex interplay of socio-cultural conventions, interpersonal relationships, speaker attitudes, and contextual factors, which vary across languages and regions. In this talk, I will present two complementary lines of inquiry: (1) how large language models (LLMs) can be used as tools to conduct large-scale analyses of honorific usage from Wikipedia data, and what such analyses reveal about underlying socio-cultural conventions; and (2) how honorific systems themselves can serve as a lens for examining the socio-cultural and pragmatic understanding of South Asian languages exhibited by LLMs. The findings reveal several universal patterns across the languages studied, alongside striking differences.
Usman Naseem
Assistant Professor,
School of Computing at Macquarie University,
Sydney, Australia