location: 518B
Morning Session
9.00 - 9.30 opening remarks
9.30 - 10.15 keynote 1: David M. Rothschild (Microsoft Research)
Successfully Navigating the Disruption: transparency and standards, ensuring safety and trust while efficiently accelerating learning from surveys in the age of AI
Surveys are a core methodological tool in government, industry, and academia, providing essential data for theory development and evidence-based decision-making. As artificial intelligence continues its rapid advancement, it stands to fundamentally transform the entire survey lifecycle --- from design and administration to analytics and reporting. Previous transitions to new technologies, such as telephone, internet, and non-probability surveys, led to divisions within the survey research community with real consequences for both the trajectory of research and trust in the industry. Fortunately, the survey community is more prepared this time to take proactive steps now to avoid similar challenges with AI integration. In this talk I first outline key ways in which AI is affecting the survey research lifecycle, while identifying promising research opportunities and innovations that merit further exploration. I then outline strategic recommendations for the survey research community to navigate this transition effectively, including guidelines for publication standards and research prioritization. Finally, I discuss collaborative initiatives between AI specialists and survey researchers that could yield mutual advantages. As the chair of AAPOR’s Task Force on Responsible AI Integration in Survey Research, I hope the talk’s discussion (and the rest of the workshop) can be fruitful as we consider transparency and standards that we hope will help smooth the disruption of AI on survey research, guiding an efficient acceleration of learning from surveys in the age of AI.
10.20 - 11.20 oral talk session 1 (3x 15min + 5min)
Who Counts? The Potentials and Pitfalls of Using LLMs in Survey Research, Leah von der Heyde
Beyond Consensus: Use of Demographics for Datasets that Reflect Annotator Disagreement, Narjes Tahaei, Sabine Bergler
Exploring Side-by-Side LLM Evaluation Through Human Alignment and Bias Mitigation, Kseniia Titova, Darina Rustamova, Alan-Barsag Gazzaev, Maksim Polushin, Valentin Malykh, Sergey Zagoruyko
11.20 - 11.45 coffee break
11.45 - 13.15 poster presentation (in person): all non-archival accepted submissions
Social sciences and AI joining forces: towards new approaches for computational social sciences, Katharina Soemer, Daniela Grunow, Steffen Eger
Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions, Joseph Suh, Erfan Jahanparast, Suhong Moon, Minwoo Kang, Serina Chang (unable to present in-person)
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors, Tiancheng Hu, Joachim Baumann, Lorenzo Lupo, Nigel Collier, Dirk Hovy, Paul Röttger
Understanding Public Opinion through Social Media: Summarization, Stance Annotation, Demographic Inference, Frederick Conrad, Michael Schober, Rebecca S. Dolgin, Mao Li, Peilin Chen, Erik Zhou
Using Large Language Models to measure and classify occupations in surveys, Patrick Sturgis, Tom, Laura Feng, Caroline Roberts
Annotation Sensitivity: Training Data Collection Methods Affect Model Performance, Christoph Kern, Stephanie Eckman, Jacob Beck, Bolei Ma, Rob Chew, Frauke Kreuter
Position: Insights from Survey Methodology can Improve Training Data, Stephanie Eckman, Barbara Plank, Frauke Kreuter
Multi-Instance Learning for Social Media- Based Spatiotemporal Public Opinion Analysis, Shanshan Bai, Anna Kruspe, Xiao Xiang Zhu
Back to the basics and to the future: Evaluating silicon samples with POR standards, Yongwei Yang, Gina Walejko
Linking Survey and Social Media Data: Natural Language Processing for Bridging the Gap Between Open Access and Data Protection, Conor Gaughan, Rachel Gibson, Alexandru Cernat, Marta Cantijoch, Riza Batista-Navarro
Beyond Accuracy: A Replication Fidelity Framework for Trustworthy LLM Evaluation in Social Science Applications, Chen Peng, Samridh Aggarwal, Arnstein Aassve, Lorenzo Lupo, Nicolò Cavalli
LLM-Enhanced Survey Methodology: Validation, Automation, and Mixed Methods at Scale, Les DeBusk-Lane, Anirban Pal
More Parameters Than Populations: A Systematic Review of Large Language Models in Survey Research, Trent D Buskirk, Florian Keusch, Leah von der Heyde, Adam Eck
Evaluating the Human-Likeness of LLM-Generated Open-Ended Responses, Joshua Y. Lerner, Brandon Sepulvado, Lilian Huang, Soubhik Barari
Cross-corpora argument analysis using textual entailment, Algis Petlin, Sue-Ellen Duffy, Ankita Gupta, Brendan O'Connor
SAI What?! Ping the Bots Before You Probe the People: Testing Large Language Models for Pre-Cognitive Interviewing in Survey Research, Trent D Buskirk, Darby Steiger, Courtney Kennedy
In Your Own Words: Free-Text Descriptions of Identity Reveal Information Beyond Census Categories, Jenny Shan Wang, Emma Pierson (unable to present in-person)
In-Context Learning for the Imputation of Survey Data, Tobias Holtdirk, Georg Ahnert, Anna-Carolina Haensch
AIn’t Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation, Leah von der Heyde, Anna-Carolina Haensch, Bernd Weiß, Jessica Daikeler
PrimeX: A Dataset of Worldview, Opinion, and Explanation, Rik Koncel-Kedziorski, Brihi Joshi, Tim Paek
Lunch Break (13:15 - 14:30)
Afternoon Session
14.30 - 15.15 keynote 2: Lora Aroyo (Google DeepMind)
Beyond a Single Viewpoint: Embracing diverse human values in data collection and AI evaluation
Incorporating the wide range of diversity of human perspectives is crucial for building AI responsibly, particularly when evaluating the safety of generative AI models. This diversity manifests in varying interpretations of harm and offensiveness, influenced by socio-cultural factors. Traditional AI evaluation often overlooks this, relying on binary classifications and aggregated ratings that obscure individual perspectives. Frameworks like GRASP and CrowdTruth - which harness and analyze rater disagreement - help reveal these nuances and identify the influence of demographic factors on subjective tasks. Recognizing this inherent ambiguity and incorporating diverse viewpoints into the data used for training and evaluation is paramount for developing AI systems that are truly inclusive, reliable, and reflective of the values of all users.
15.20 - 16.00 oral talk session 2 (2x 15min + 5min)
Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts, Shreyas Tirumala, Nishant Jain, Danny D. Leybzon, Trent D Buskirk
Uncovering Hidden Factions through Text-Network Representations: Unsupervised Public Opinion Mapping of Iran on Twitter in the 2022 Unrest, Sahar Omidi Shayegan, Jean-François Godbout, Reihaneh Rabbany
16.00 - 16.30 coffee break
16.30 - 17.15 panel discussion
moderator: Stephanie Eckman (University of Maryland and Amazon)
panelists:
Lora Aroyo, Research Scientist and Team Lead at Google DeepMind
Serena Booth, CS Professor at Brown University
David M. Rothschild, Research Scientist at Microsoft Research
Patrick Sturgis, Professor of Quantitative Social Science at London School of Economics
17.15 - 17.30 closing