I am a third-year PhD student at the Language Technologies Institute at Carnegie Mellon University where I am advised by Fernando Diaz. I am broadly interested in evaluation and AI + Culture and my research lies at the intersection of Natural Language Processing, Evaluation, Responsible AI, and Human-AI Interaction. In my PhD, I work on evaluating the cultural competence and impacts of language technologies, and on improving the sensitivity and reliability of evaluation methods.
During my PhD, I have interned at Semantic Scholar at the Allen Institute for AI, with Maria Antoniak and Tal August. Before my PhD, I was a pre-doctoral researcher at Google Research, India and Technology, AI, Society, and Culture Team, working with Partha Talukdar and Vinodkumar Prabhakaran on fairness in the Indian context. Before that, I spent a year at Microsoft Research, India working with Sunayana Sitaram and Monojit Choudhury on scalable and interpretable multilingual evaluation and human-in-loop evaluation. I graduated from BITS Pilani with a B.E. in Computer Science in 2021.
Research Summary:
In my PhD, I have worked on two threads: AI + Culture and Improving quantitive evaluation methods.
AI + Culture
I have worked on evaluating models' cultural competence, particularly in long-form and creative generation tasks. I employ quantitive measurements and qualitative methods, as necessary and often together, when developing evaluations.
Specifically, I have demonstrated that intrinsic and extrinsic evaluation of cultural competence don't correlate [EECC], studied cultural misrepresentations in model-generated stories [Lehengas is School], and analysed models' alignment to various research cultures [Research Borderlands]. My ongoing projects include developing evaluations for cultural homogenisation and stereotyping in model-generated stories for diverse demographic identities, languages, and dialects, and assessing the construct and convergent validity of cultural competence benchmarks.
These days I have been thinking a lot about the following future research directions in AI + Culture:
(a) situated evaluation, or understanding how professionals and artists use and interpret AI outputs in their workflows, how its (sociocultural) incompetence impacts this process, and how we can use implicit and explicit user feedback to detect and analyse these patterns;
(b) intrinsic evaluation, or how cultural competence can be tracked during pretraining.
Improving Quantitive Evaluation
My quantitive evaluation work focuses on improving the sensitivity, reliability, and predictive validity of evaluation metrics used during both pre-training and post-training.
For pre-training evaluation, I am developing a preference-based intrinsic evaluation metric, an alternative to standard perplexity, with substantially improved sensitivity, sample efficiency, and predictive validity during pre-training experimentation. For post-training evaluation, I am developing a framework of eliciting granular preferences in complex rating tasks. This method substantially reduces cognitive load of human-raters and improves the reliability of auto-raters in preference-elicitation, leading to more sensitive and reliable comparison between models.
Overall my research aims to improve evaluation methods, make language technologies more inclusive to diverse users, and understand how language, technology, and society interact and shape each other.
News:
[October 2025] Two papers I contributed to have been accepted to EMNLP Findings! Juhyun's position paper on intentionally cultural evaluation, and Ananaya's paper on impact of ethnicity markers in hate-speech predictions 🥳
[May 2025] My internship work "Research Borderlands: Analysing Writing Across Research Cultures" was accepted as one of the 16 (!) plenary talks at the International Conference on Computational Social Science (IC2S2) 2025. AND It has been accepted to ACL 2025 (main). See you in Sweden and Austria 🎉
[Oct 2024] Thrilled that my first paper from PhD on evaluating cultural competence of LLM in text generation settings was accepted to EMNLP Findings! See you in Miami 🎉
[Aug 2023] I started my PhD at LTI, CMU.
[Oct 2022] Work on fairness in the Indian context from Google Research and on scalable and interpretable multilingual evaluation from my internship at MSR, were accepted to AACL 2022.
[Oct 2021] I started a github repo to curate advice related to grad school applications and research. Please contribute!
DEI Efforts:
DEI efforts and advocacy have always been an integral part of my life and career. My volunteer work profoundly shapes my views around access and the impact of technology and social opportunities.
I am an organiser at Queer in AI, where I help run our workshops and other initiatives to promote inclusion in the ACL community. Before that, I was co-organizing for WiNLP (Widening NLP), an organisation that supports underrepresented groups in NLP. I co-organize the mentorship program at LTI and frequently serve as a mentor for initiatives aimed at introducing people to research, both within and outside CMU. In my undergrad, I worked for educational and mental health initiatives for underprivileged kids for over three years. I am always looking for opportunities to do my bit to make the ACL, ML, and STeM communities more welcoming to everyone.
The interaction of society and technology is drastically altering how opportunities and marginalisation for underrepresented communities can be and are being, created. The landscape of AI and NLP for societal applications has a lot of uncharted territories. We need to understand who our technology affects and how to ensure that we do not harm the communities we seek to benefit. Technology can only genuinely benefit society when the people for whom it is being created are included in the process. So, we need to empower and listen to diverse voices in and outside of the research communities.
Contact
Reach me at: shaily@cmu.edu.
I am particularly happy to help undergraduate students, especially women, interested in NLP/ML, with exploring research, and applying for research internships or graduate studies (MS / PhD). I am open to talking about how I can help DEI efforts in the ACL and ML communities.