Bridging Human and LLM Annotations for Statistically Valid Computational Social Science

Welcome!

The tutorial provides participants with a practical, hands-on experience in integrating Large Language Models (LLMs) and human annotations to streamline annotation workflows, ensuring both efficiency and statistical rigor. As LLMs revolutionize data annotation with their ability to label and analyze complex social phenomena at unprecedented scales, they also pose challenges in ensuring the reliability and validity of results.

This tutorial introduces a systematic approach to combining LLM annotations with human input, enabling researchers to optimize annotation processes while maintaining rigorous standards for statistical inference. The session begins by framing the opportunities and challenges of leveraging LLMs for Computational Social Science (CSS). The tutorial demonstrates techniques for combining LLM annotations with human annotations to ensure statistically valid results while minimizing annotation costs.

Through hands-on implementation using open-source datasets and code notebooks, participants will apply these methods to popular CSS tasks, such as stance detection, media bias, and online hate and misinformation. Additionally, the session will explore how these approaches can be adapted to other domains, such as psychology, sociology, and political science. By the end of the session, participants will gain actionable skills for reliably leveraging LLMs for data annotation in their own research.

Organizers

Kristina Gligorić

Postdoctoral Scholar at Stanford University

Incoming Assistant Professor at Johns Hopkins University

Tijana Zrnic

Researcher at LMArena

Incoming Assistant Professor at Stanford University

Cinoo Lee

Postdoctoral Scholar at Stanford University

Tutorial schedule

1.30pm - 3pm: Introduction and methodological overview including a Q&A

3pm - 3.30pm: Break

3.30pm - 5pm: Hands-on guided coding tutorial

Tutorial materials

Slides: https://drive.google.com/file/d/1niZBa2oII2xSskVLyw-EAbZqUpwD0P01/view?usp=sharing

Code & data: https://github.com/kristinagligoric/cdi-tutorial

Zoom link (for optional remote Q&A; password provided in-person): https://stanford.zoom.us/s/97845279014

Pre-tutorial preparation

The tutorial will be given in python. To be able to follow the tutorial and test the code in real-time, ahead of the tutorial, we recommend:

Cloning our GitHub repository and installing the required packages (pandas, scikit, etc.) as explained in the README.md page in the repository.
Opening the tutorial and running the cells top‑to‑bottom to test the environment.

Optionally, if you’d like to work with your own dataset during the final part of the tutorial, please prepare a CSV file in advance. The file should have the following three columns specifying, for each text:

(1) human labels (e.g., “polite” or “impolite”, "positive" or "negative", etc., leaving a NaN value for texts where human annotations are not available),

(2) LLM labels (same format as human labels, present for all the texts), and

(3) a binary text-based feature (0 or 1 indicating absence or presence of a feature).

You're also encouraged to think ahead about what statistical estimation task you’re interested in (such as estimating means, medians, regression coefficients, or prevalence), and which label value the estimation should focus on (e.g., analogously to estimating the prevalence of “polite” class, or the effect of a feature on politeness).

Contact

For any questions, please email the organizers: gligoric@stanford.edu, tijana.zrnic@stanford.edu, cinoolee@stanford.edu

Page updated

Google Sites

Report abuse