The tutorial provides participants with a practical, hands-on experience in integrating Large Language Models (LLMs) and human annotations to streamline annotation workflows, ensuring both efficiency and statistical rigor. As LLMs revolutionize data annotation with their ability to label and analyze complex social phenomena at unprecedented scales, they also pose challenges in ensuring the reliability and validity of results.
This tutorial introduces a systematic approach to combining LLM annotations with human input, enabling researchers to optimize annotation processes while maintaining rigorous standards for statistical inference. The session begins by framing the opportunities and challenges of leveraging LLMs for Computational Social Science (CSS). The tutorial demonstrates techniques for combining LLM annotations with human annotations to ensure statistically valid results while minimizing annotation costs.
Through hands-on implementation using open-source datasets and code notebooks, participants will apply these methods to popular CSS tasks, such as stance detection, media bias, and online hate and misinformation. Additionally, the session will explore how these approaches can be adapted to other domains, such as psychology, sociology, and political science. By the end of the session, participants will gain actionable skills for reliably leveraging LLMs for data annotation in their own research.
Organizers
Postdoctoral Scholar at Stanford University
Incoming Assistant Professor at Johns Hopkins University
Tutorial schedule
1.30pm - 3pm: Introduction and methodological overview including a Q&A
3pm - 3.30pm: Break
3.30pm - 5pm: Hands-on guided coding tutorial
Tutorial materials
Slides: https://drive.google.com/file/d/1niZBa2oII2xSskVLyw-EAbZqUpwD0P01/view?usp=sharing
Code & data: https://github.com/kristinagligoric/cdi-tutorial
Zoom link (for optional remote Q&A; password provided in-person): https://stanford.zoom.us/s/97845279014
Pre-tutorial preparation
The tutorial will be given in python. To be able to follow the tutorial and test the code in real-time, ahead of the tutorial, we recommend:
Cloning our GitHub repository and installing the required packages (pandas, scikit, etc.) as explained in the README.md page in the repository.
Opening the tutorial and running the cells top‑to‑bottom to test the environment.
Optionally, if you’d like to work with your own dataset during the final part of the tutorial, please prepare a CSV file in advance. The file should have the following three columns specifying, for each text:
(1) human labels (e.g., “polite” or “impolite”, "positive" or "negative", etc., leaving a NaN value for texts where human annotations are not available),
(2) LLM labels (same format as human labels, present for all the texts), and
(3) a binary text-based feature (0 or 1 indicating absence or presence of a feature).
You're also encouraged to think ahead about what statistical estimation task you’re interested in (such as estimating means, medians, regression coefficients, or prevalence), and which label value the estimation should focus on (e.g., analogously to estimating the prevalence of “polite” class, or the effect of a feature on politeness).
Contact
For any questions, please email the organizers: gligoric@stanford.edu, tijana.zrnic@stanford.edu, cinoolee@stanford.edu