6th Workshop on
NLP and CSS
At NAACL, JUNE 21, 2024
Welcome to the 6th Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)!
Dallas Card (University of Michigan)
Anjalie Field (Johns Hopkins University)
Dirk Hovy (Bocconi University)
Katie Keith (Williams College)
Email to contact organizers: nlp-and-css -at- googlegroups.com
On Twitter: @NLPandCSS
Schedule
June 21, 2024
Room: Alberto 3
All times are the local time to Mexico City: CST (Central Standard Time), UTC/GMT -6
9:15-9:30 - Opening Remarks
9:30-10:30 - Invited Talk #1: Naoki Egami (virtual)
10:30-11:00 - Break
11:00-12:30 - Poster Session (all accepted papers)
12:30–14:00 - Lunch
14:00-15:00 - Invited Talk #2: Helena Gómez Adorno (in-person)
15:00-15:30 - Small-Group Creative Activity (details TBA)
15:30-16:00 - Break
16:00-17:00 - Invited Talk #3: Maria Antoniak (in-person)
17:00-17:30 - Activity Wrap-up & Closing Remarks
INVITED TALKS
Invited Talk #1: Naoki Egami (virtual)
Title: Using Large Language Model Annotations for the Social Sciences: A General Framework of Using Predicted Variables in Downstream Analyses
Abstract: Social scientists use automated annotation methods, such as supervised machine learning and, more recently, large language models (LLMs), that can predict labels and generate text-based variables. While such predicted text-based variables are often analyzed as if they were observed without errors, we first show that ignoring prediction errors in the automated annotation step leads to substantial bias and invalid confidence intervals in downstream analyses, even if the accuracy of the automated annotations is high, e.g., above 90%. We propose a framework of design-based supervised learning (DSL) that can provide valid statistical estimates, even when predicted variables contain non-random prediction errors. DSL employs a doubly robust procedure to combine predicted labels and a smaller number of expert annotations. DSL allows scholars to apply advances in LLMs and natural language processing to social science research while maintaining statistical validity. We illustrate its general applicability using two applications where the outcome and independent variables are text-based.
Invited Talk #2: Helena Gómez Adorno (in-person)
Title: HOMO-MEX: Hate speech detection towards the Mexican Spanish speaking LGBT+ population
Abstract: Despite the global advances against discrimination against the LGBT+ population, LGBT+phobia is still an issue faced by most of the LGBT+ population. In this talk, I will present the Homo-MEX project, which aims to continually provide benchmarks for evaluating the state of the art in identifying homophobic content. Homo-Mex focuses on the automatic detection and classification of LGBT+phobia in social media Posts (Tweets) and, more recently, on the detection of Homophobic content in song lyrics. This initiative allows researchers to track progress in this area over time and encourages the development of more accurate and efficient methods.
Invited Talk #3: Maria Antoniak (in-person)
Title: Risk and Reward in Chatbot Conversations: Studying Open Prompt Datasets and User Behavior
Abstract: While most naturally-occurring prompts for LLMs are hidden by industry, researchers are able to explore real prompts via a small set of open datasets. In this talk, I will present ongoing work that creates and examines these datasets for patterns in user tasks (query goals) and personal disclosures (sensitive and private information revealed in a user-chatbot conversation). These patterns allow us to design guidance for users and chatbot designers as well as build evaluation datasets. I will also discuss targeted experiments that solicit prompt data in two grounded settings. These combined patterns and experimental insights allow us to study how people are currently using chatbots and the risks involved in those interactions.
Accepted papers
Can Large Language Model Agents Simulate Human Trust Behaviors? Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, Philip Torr, Bernard Ghanem, Guohao Li.
Detecting Perspective-Getting in Wikipedia Discussions. Evgeny Vasilets, Tijs Van den Broek, Anna Wegmann, David Abadi, Dong Nguyen.
Connecting the Dots in News Analysis: Bridging the Cross-Disciplinary Disparities in Media Bias and Framing. Gisela Vallejo, Timothy Baldwin, Lea Frermann.
The Crime of Being Poor: Associations between Crime and Poverty on Social Media in Eight Countries. Georgina Curto, Svetlana Kiritchenko, Kathleen C. Fraser, Isar Nejadgholi.
Discovering Implicit Meanings of Cultural Motifs from Text. Anurag Acharya, Diego Castro Estrada, Shreeja Dahal, W. Victor H. Yarlott, Diana Gomez, Mark Finlayson.
Can Large Language Models (or Humans) Disentangle Text? Nicolas Audinet de Pieuchon, Adel Daoud, Connor Thomas Jerzak, Moa Johansson, Richard Johansson.
Retrieval Augmented Generation of Subjective Explanations for Socioeconomic Scenarios. Razvan-Gabriel Dumitru, Maria Alexeeva, Keith Alcock, Nargiza Ludgate, Cheonkam Jeong, Zara Fatima Abdurahaman, Prateek Puri, Brian Kirchhoff, Santadarshan Sadhu, Mihai Surdeanu.
Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input. Tessa Masis, Brendan O'Connor.
News Deja Vu: Connecting Past and Present with Semantic Search. Brevin Franklin, Emily Silcock, Abhishek Arora, Tom Bryan, Melissa Dell.
Knowledge Distillation in Automated Annotation: Supervised Text Classification with LLM-Generated Training Labels. Nicholas J Pangakis, Sam Wolken.
Clustering Document Parts: Detecting and Characterizing Influence Campaigns from Documents. Zhengxiang Wang, Owen Rambow.
A First Step towards Measuring Interdisciplinary Engagement in Scientific Publications: A Case Study on NLP + CSS Research. Alexandria Leto, Shamik Roy, Alexander Hoyle, Daniel Acuna, Maria Leonor Pacheco.
Towards Fine-Grained Stance Detection: A Case Study of Shifting Opinions in an Online Conservative Community. Rupak Sarkar, Alexander Hoyle, Philip Resnik.
TOPCAT: Topic-Oriented Protocol for Content Analysis of Text – A Preliminary Study. Philip Resnik, Bolei Ma, Alexander Hoyle, Pranav Goel, Rupak Sarkar, Maeve Gearing, Carol Bruce, Anna-Carolina Haensch, Frauke Kreuter.
A Tale of Disagreement: Examining Perceptions of Storytelling in Experts and Crowds. Joel Mire, Maria Antoniak, Elliott Ash, Andrew Piper, Maarten Sap.