Welcome !
Alt: Headshot of Shaily Bhatt
I am a second-year PhD student at the Language Technologies Institute at Carnegie Mellon University where I am advised by Fernando Diaz. I am also interning at Semantic Scholar at the Allen Institute for AI, with Maria Antoniak and Tal August.
I am broadly interested in making language technologies more inclusive to users from diverse cultures and understanding how language, technology, and society interact and shape each other. My work measures the sociocultural capabilities and impacts of language technologies using both quantitative and qualitative methods.
Before my PhD, I was a Predoctoral Researcher at the NLU Group at Google Research, India where I worked with Partha Talukdar and Vinodkumar Prabhakaran. Before that, I spent a year at Microsoft Research, India working with Sunayana Sitaram and Monojit Choudhury. I graduated from BITS Pilani with a B.E. in Computer Science in 2021.
Summary of Research:
I am interested in measuring the sociocultural capabilities and impacts of language technologies. My work is anchored in the following themes:
Cultural Competence of Language Technologies:
Evaluating cultural competence of LLMs in text generation [EMNLP 2024]
Evaluating LLMs' abilities to adapt to different research cultures when used as writing assistants grounded in the expertise of interdisciplinary scholars. [WIP @ Ai2]
Societal and Ethical Impacts of AI Systems:
Quantifying the homogenization and stereotyping present in cultural adaptations made by LLMs. [WIP]
Analysing (un)reliability of LLMs' hate speech predictions in the presence of ethnicity markers [SafeGenAI Workshop @ NeurIPS 2024]
Evaluating biases in models for the Indian cultural context [Work while at Google Research, AACL 2022]
Quantitative and Qualitative Methods for Principled and Human-Centered Measurement of Systems Capabilities / Limitations / Impacts:
Quantitative approaches used across my work include developing measurements for complex constructs, curating and analysing datasets, and [WIP] improving user feedback elicitation.
Mixed-methods approaches, like using interview studies and computational corpus linguistics (text-as-data methods), to define and measure research cultures; that in turn, inform the quantitative evaluation of LLMs as research writing assistants. [WIP @ Ai2]
Human-in-loop approaches for scalable and interpretable evaluation of multilingual models [AACL 2022] and improving production systems [Workshop on Human Evaluation of NLP Systems at EACL 2021]. [Work while at Microsoft Research]
News:
[Nov 2024] Ananaya's paper on evaluating robustness in hate-speech prediction in the presence of ethnicity markers was accepted to the Safe Generative AI Workshop at NeurIPS 2024
[Oct 2024] Thrilled that my first paper from PhD on evaluating cultural competence of LLM in text generation settings was accepted to EMNLP Findings! See you in Miami 🎉
[May 2024] I will be interning at Semantic Scholar at Allen Institute of AI, and looking forward to Seattle this summer! =D
[Aug 2023] I started my PhD at LTI, CMU.
[Oct 2022] Work on fairness in the Indian context from Google Research and on scalable and interpretable multilingual evaluation from my internship at MSR, were accepted to AACL 2022.
[Oct 2021] I started a github repo to curate advice related to grad school applications and research. Please contribute!
DEI Efforts:
DEI efforts and advocacy have always been an integral part of my life and career. My volunteer work profoundly shapes my views around access and the impact of technology and social opportunities.
I am an organiser at Queer in AI, where I help run our workshops and other initiatives to promote inclusion in the ACL community. Before that, I was co-organizing for WiNLP (Widening NLP), an organisation that supports underrepresented groups in NLP. I co-organize the mentorship program at LTI and frequently serve as a mentor for initiatives aimed at introducing people to research, both within and outside CMU. In my undergrad, I worked for educational and mental health initiatives for underprivileged kids for over three years. I am always looking for opportunities to do my bit to make the ACL, ML, and STeM communities more welcoming to everyone.
The interaction of society and technology is drastically altering how opportunities and marginalisation for underrepresented communities can be and are being, created. The landscape of AI and NLP for societal applications has a lot of uncharted territories. We need to understand who our technology affects and how to ensure that we do not harm the communities we seek to benefit. Technology can only genuinely benefit society when the people for whom it is being created are included in the process. So, we need to empower and listen to diverse voices in and outside of the research communities.
Contact
Reach me at: shaily@cmu.edu or on twitter.
I am particularly happy to help undergraduate students, especially women, interested in NLP/ML, with exploring research, and applying for research internships or graduate studies (MS / PhD). I am open to talking about how I can help DEI efforts in the ACL and ML communities.