Stanford
Social NLP Reading group
Purpose
Welcome! Social NLP reading group will focus on reading papers at the intersection of social sciences and NLP. The papers will ideally expose us to
(1) Significant innovations in NLP methods, models, or design paradigms applied to social problems, or
(2) New theories and concepts from social sciences and how they are studied with NLP approaches.
Logistics
We hold weekly meetings where we discuss paper(s) related to a theme of interest.
How to get involved
Join us for our weekly meeting: Wed 10.00 - 11.00 am at Gates 304 and on Zoom
Join our Slack channel: #social-nlp
Organizers
Funding
We are grateful to Stanford HAI for supporting us through the Student Affinity Groups program.
Schedule Spring Quarter 2023/24
Week 1, Apr 17th
Topic: LLM agents
Papers:
Wu, Qingyun, et al. "Autogen: Enabling next-gen llm applications via multi-agent conversation framework." arXiv preprint arXiv:2308.08155 (2023). https://arxiv.org/abs/2308.08155
Discussion lead: Chi Wang (guest)
Week 2, Apr 24th
Topic: Prediction-powered inference and LLM evaluation
Papers:
Angelopoulos, A.N., Bates, S., Fannjiang, C., Jordan, M.I. and Zrnic, T., 2023. Prediction-powered inference. Science, 382(6671), pp.669-674. https://www.science.org/doi/full/10.1126/science.adi6000
Zrnic, T. and Candès, E.J., 2024. Active Statistical Inference. arXiv preprint arXiv:2403.03208. https://arxiv.org/abs/2403.03208
Boyeau, P., Angelopoulos, A.N., Yosef, N., Malik, J. and Jordan, M.I., 2024. AutoEval Done Right: Using Synthetic Data for Model Evaluation. arXiv preprint arXiv:2403.07008. https://arxiv.org/abs/2403.07008
Discussion lead: Tijana Zrnić (guest)
Week 3, May 1st
Topic: Automated data testing in Australian politics and Canadian journalism
Papers:
Digitization of the Australian Parliamentary Debates, 1998–2022 https://www.nature.com/articles/s41597-023-02464-w
Digitization of the Canadian Parliamentary Debates https://www.jstor.org/stable/26858546
Implementing Automated Data Validation for Canadian Political Datasets https://arxiv.org/abs/2309.12886
Evaluating the Decency and Consistency of Data Validation Tests Generated by LLMs https://arxiv.org/abs/2310.01402
Discussion lead: Lindsay Katz (guest)
Week 4, May 8th
Topic: Concept-aware LLMs
Papers:
Shani, Chen, Jilles Vreeken, and Dafna Shahaf. "Towards Concept-Aware Large Language Models." https://aclanthology.org/2023.findings-emnlp.877/
Optional:
Jiang, Yang, et al. "Expert feature-engineering vs. deep neural networks: which is better for sensor-free affect detection?." Artificial Intelligence in Education: 19th International Conference, AIED 2018, London, UK, June 27–30, 2018, Proceedings, Part I 19. Springer International Publishing, 2018. https://learninganalytics.upenn.edu/ryanbaker/jiang-aied2018.pdf
Rytting, Christopher, and David Wingate. "Leveraging the inductive bias of large language models for abstract textual reasoning." Advances in Neural Information Processing Systems 34 (2021): 17111-17122. https://arxiv.org/pdf/2110.02370
Si, Chenglei, et al. "Measuring inductive biases of in-context learning with underspecified demonstrations." arXiv preprint arXiv:2305.13299 (2023). https://aclanthology.org/2023.acl-long.632.pdf
Papadimitriou, Isabel, and Dan Jurafsky. "Injecting structural hints: Using language models to study inductive biases in language learning." Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. https://aclanthology.org/2023.findings-emnlp.563.pdf
Cazenavette, George, and Simon Lucey. "On the bias against inductive biases." arXiv preprint arXiv:2105.14077 (2021). https://arxiv.org/pdf/2105.14077
Discussion lead: Chen Shani
Schedule Winter Quarter 2023/24
Week 1, Jan 24th
Topic: Probing for making consistent measurements with LLMs
Papers:
Unsupervised Contrast-Consistent Ranking with Language Models (Stoehr et al. 2023)
Characterizing Mechanisms for Factual Recall in Language Models (Yu et al., 2023)
Optional:
Measurement in the Age of LLMs: An Application to Ideological Scaling (O’Hagan and Schein, 2023)
Discovering Latent Knowledge in Language Models Without Supervision (Burns et al., 2023)
Linear Representations of Sentiment in Large Language Models (Tigges et al, 2023)
Discussion lead: Niklas Stoehr (guest)
Week 2, Jan 31st
Topic: Persuasion in the era of LLMs
Papers:
Discussion lead: Weiyan Shi
Week 3, Feb 14th
Topic: NLP & Policing
Papers:
Rho, Eugenia H., et al. "Escalated police stops of Black men are linguistically and psychologically distinct in their earliest moments." Proceedings of the National Academy of Sciences 120.23 (2023): e2216162120. https://www.pnas.org/doi/full/10.1073/pnas.2216162120
Optional:
Pierson, E., Simoiu, C., Overgoor, J., Corbett-Davies, S., Jenson, D., Shoemaker, A., ... & Goel, S. (2020). A large-scale analysis of racial disparities in police stops across the United States. Nature human behaviour, 4(7), 736-745. https://www.nature.com/articles/s41562-020-0858-1
Weisburd, D., Telep, C. W., Vovak, H., Zastrow, T., Braga, A. A., & Turchan, B. (2022). Reforming the police through procedural justice training: A multicity randomized trial at crime hot spots. Proceedings of the National Academy of Sciences, 119(14), e2118780119. https://www.pnas.org/doi/full/10.1073/pnas.2118780119
Prowse, G., Weaver, V. M., & Meares, T. L. (2020). The state from below: Distorted responsiveness in policed communities. Urban Affairs Review, 56(5), 1423-1471. https://journals.sagepub.com/doi/full/10.1177/1078087419844831
Discussion lead: Maggie Harrington
Week 4, Feb 21st
Topic: Values in LLMs
Papers:
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties https://arxiv.org/abs/2309.00779
A Roadmap to Pluralistic Alignment https://arxiv.org/abs/2402.05070
LLMs grasp morality in concept https://arxiv.org/abs/2311.02294
Optional:
Can Machines Learn Morality? The Delphi Experiment. https://arxiv.org/abs/2110.07574
Aligning AI With Shared Human Values https://arxiv.org/abs/2008.02275
Discussion lead: Jared Moore
Week 5, Feb 28th
Topic: Reasoning about networks with LLMs
Papers:
Papachristou, Marios, and Yuan Yuan. "Network Formation and Dynamics Among Multi-LLMs." arXiv preprint arXiv:2402.10659 (2024). https://arxiv.org/pdf/2402.10659.pdf
Fatemi, Bahare, Jonathan Halcrow, and Bryan Perozzi. "Talk like a graph: Encoding graphs for large language models." arXiv preprint arXiv:2310.04560 (2023). https://arxiv.org/pdf/2310.04560.pdf
Discussion lead: Alicja Chaszczewicz
Week 6, Mar 6th
Topic: Beyond Memorization: Stronger Language Models and Their Privacy Implication
Papers:
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory https://arxiv.org/abs/2310.17884
What Does it Mean for a Language Model to Preserve Privacy? https://arxiv.org/abs/2202.05520
Optional:
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents https://arxiv.org/abs/2401.10019
Discussion lead: Yijia Shao
Schedule Fall Quarter 2023/24
Week 1, Oct 16th
Topic: Language, Identity, and Language Model Interaction
Papers:
Schlesinger, Ari, W. Keith Edwards, and Rebecca E. Grinter. "Intersectional HCI: Engaging identity through gender, race, and class." Proceedings of the 2017 CHI conference on human factors in computing systems. 2017. https://dl.acm.org/doi/10.1145/3025453.3025766
Zamfirescu-Pereira, J. D., et al. "Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts." Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 2023. https://dl.acm.org/doi/abs/10.1145/3544548.3581388
Optional:
Zheng, Lianmin, et al. "LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset." arXiv preprint arXiv:2309.11998 (2023). https://arxiv.org/pdf/2309.11998v3.pdf
Volkova, Svitlana, et al. "Inferring latent user properties from texts published in social media." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 29. No. 1. 2015. https://ojs.aaai.org/index.php/AAAI/article/view/9271
Discussion lead: Will Held
Week 2, Oct 23rd
Topic: From Question Answering to Question Asking
Papers:
Large Language Models for Automated Open-domain Scientific Hypotheses Discovery (arxiv preprint, https://arxiv.org/abs/2309.02726)
Socially situated artificial intelligence enables learning from human interaction (PNAS Vol. 119 | No. 39, https://www.pnas.org/doi/full/10.1073/pnas.2115730119 )
Don’t Just Tell Me, Ask Me: AI Systems that Intelligently Frame Explanations as Questions Improve Human Logical Discernment Accuracy over Causal AI explanations (CHI 2023, https://dl.acm.org/doi/10.1145/3544548.3580672)
Discussion lead: Yijia Shao
Week 3, Oct 30th
Topic: Anthropomorphism
Papers:
See in the slack channel.
Optional
Talking About Large Language Models https://arxiv.org/abs/2212.03551
Computational analysis of 140 years of US political speeches reveals more positive but increasingly polarized framing of immigration https://www.pnas.org/doi/epdf/10.1073/pnas.2120510119
Discussion lead: Myra Cheng
Week 4, Nov 6th
Topic: Guiding Design of LLM-powered chatbots with Expert Feedback
Papers:
Petridis, Savvas, et al. "ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles." https://arxiv.org/abs/2310.15428
Chen, Siyuan, et al. "LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and Evaluation." https://arxiv.org/abs/2305.13614
Discussion lead: Ryan Louie
Week 5, Nov 13th
Topic: Representation of global culture and values in LLMs
Papers:
Towards Measuring the Representation of Subjective Global Opinions in Language Models https://arxiv.org/pdf/2306.16388.pdf
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models https://arxiv.org/pdf/2305.14456.pdf
(only Section 5) BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models https://arxiv.org/vc/arxiv/papers/2309/2309.06085v1.pdf
Discussion lead: Yifan Mai
Week 6, Nov 27th
Topic: Conversational grounding with LLMs
Papers:
Grounding or Guesswork? Large Language Models are Presumptive Grounders https://arxiv.org/abs/2311.09144
Principles of Mixed-Initiative Interaction https://dl.acm.org/doi/pdf/10.1145/302979.303030
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations https://arxiv.org/abs/2311.05584
Optional:
Co-constructing intersubjectivity with artificial conversational agents: People are more likely to initiate repairs of misunderstandings with agents represented as human https://www.sciencedirect.com/science/article/pii/S0747563215303101
Generative Theories of Interaction https://hal.science/hal-03434142/file/GenTheory%20authorversion.pdf
Discussion lead: Omar Shaikh
Week 7, Dec 4th
Topic: Eliciting human preferences with Language models
Papers:
Eliciting Human Preferences with Language Models. Belinda Z. Li, Alex Tamkin, Noah Goodman, Jacob Andreas. https://arxiv.org/abs/2310.11589
Discussion lead: Alex Tamkin
Schedule Spring Quarter 2022/23
Week 1, Apr 6th
Topic: Overconfidence and uncertainty
Papers:
On Hedging in Physician-Physician Discourse; Ellen F. Prince (Linguistics), Joel Frader (Pediatrics), and Charles Bosk (Sociology)." (1980). http://www.cs.columbia.edu/~prokofieva/CandidacyPapers/Prince_Hedging.pdf
Zhou, Kaitlyn, Dan Jurafsky, and Tatsunori Hashimoto. "Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models." arXiv preprint arXiv:2302.13439 (2023). https://arxiv.org/abs/2302.13439
Mielke, Sabrina J., et al. "Reducing conversational agents’ overconfidence through linguistic calibration." Transactions of the Association for Computational Linguistics 10 (2022): 857-872. https://arxiv.org/abs/2012.14983
Discussion lead: Kaitlyn Zhou
Week 2, Apr 13th
Topic: Large language models as simulated humans
Papers:
Generative Agents: Interactive Simulacra of Human Behavior https://arxiv.org/abs/2304.03442
Whose Opinions Do Language Models Reflect? https://arxiv.org/abs/2303.17548
(Optional) Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? https://john-joseph-horton.com/papers/llm_ask.pdf
Discussion lead: Joon Park
Week 3, Apr 20th
Topic: Culture and pragmatics
Thomas, Jenny. "Cross-cultural pragmatic failure." Applied linguistics 4.2 (1983): 91-112.
Discussion lead: Jing Huang
Week 4, Apr 27th
Topic: Reflection and self-correction
The Capacity for Moral Self-Correction in Large Language Models. https://arxiv.org/abs/2302.07459
Reflexion: an autonomous agent with dynamic memory and self-reflection. https://arxiv.org/abs/2303.11366
Discussion lead: Tiziano Piccardi
Week 5, May 4th
Topic: Polarization and mitigations
Discussion lead: Martin Saveski
Week 6, May 11th
Topic: Causality and common sense
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality. Emre Kıcıman, Robert Ness, Amit Sharma, Chenhao Tan. https://arxiv.org/abs/2305.00050
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs. Maarten Sap, Ronan Le Bras, Daniel Fried, and Yejin Choi https://arxiv.org/pdf/2210.13312
Discussion lead: Kristina Gligoric
Week 7, May 18th
Topic: Measuring stereotypes in LLMs and in simulation scenarios
Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. Cheng et al. (pdf in the Slack channel)
Discussion lead: Myra Cheng
Week 8, May 25th
Topic: Large language models and linguistic theory
What’s the deal with Chomsky? And other commonly asked questions about linguistics
Chomsky’s The False Promise of ChatGPT
Modern language models refute Chomsky’s approach to language
Optional
Modern Language Models Refute Nothing (response on twitter)
The Impact of Large Language Models on Linguistic Theory and Generative Grammar: A Critical Analysis
Discussion lead: Omar Shaikh
Week 9, Jun 1st
Topic: AI and persuasion
Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations. Goldsteain et al. https://arxiv.org/pdf/2301.04246.pdf
Quantifying the potential persuasive returns to political microtargeting. Tappin et al. https://psyarxiv.com/dhg6k/
Working with AI to persuade: Examining a large language model’s ability to generate pro-vaccination messages. Karinshak et al. https://dl.acm.org/doi/abs/10.1145/3579592
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks. Kang et al. https://arxiv.org/pdf/2302.05733.pdf
Discussion lead: Rylan Schaeffer
Week 10, Jun 8th
Brainstorming session
Schedule Winter Quarter 2022/23
Week 1, Jan 19th
Discussing the logistics
Week 2, Jan 26th
Topic: Causality
Papers:
Feder, Amir, et al. "Causal inference in natural language processing: Estimation, prediction, interpretation and beyond." Transactions of the Association for Computational Linguistics 10 (2022): 1138-1158. https://arxiv.org/pdf/2109.00725.pdf
Keith, Katherine A., Douglas Rice, and Brendan O'Connor. "Text as Causal Mediators: Research Design for Causal Estimates of Differential Treatment of Social Groups via Language Aspects." arXiv preprint arXiv:2109.07542 (2021). https://arxiv.org/pdf/2109.07542.pdf
Keidar, Daphna, et al. "Slangvolution: A Causal Analysis of Semantic Change and Frequency Dynamics in Slang." arXiv preprint arXiv:2203.04651 (2022).https://arxiv.org/pdf/2203.04651.pdf
Discussion lead: Kristina Gligoric
Week 3, Feb 2nd
Topic: Reinforcement learning
Papers:
Pyatkin, Valentina, et al. "Reinforced Clarification Question Generation with Defeasibility Rewards for Disambiguating Social and Moral Situations." arXiv preprint arXiv:2212.10409 (2022). https://arxiv.org/abs/2212.10409
Ramamurthy, Rajkumar, et al. "Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization." arXiv preprint arXiv:2210.01241 (2022). https://arxiv.org/abs/2210.01241
Discussion lead: Caleb Ziems
Week 4, Feb 9th
Topic: Prediction, explanation, and integrating the two in NLP
Papers:
Shmueli. To Explain or to predict? https://projecteuclid.org/journals/statistical-science/volume-25/issue-3/To-Explain-or-to-Predict/10.1214/10-STS330.full
Schäfer, Mike S., and Valerie Hase. "Computational methods for the analysis of climate change communication: Towards an integrative and reflexive approach." Wiley Interdisciplinary Reviews: Climate Change (2022): e806. https://wires.onlinelibrary.wiley.com/doi/epdf/10.1002/wcc.806
Hofman, Jake M., et al. "Integrating explanation and prediction in computational social science." Nature 595.7866 (2021): 181-188. https://www.nature.com/articles/s41586-021-03659-0
Agrawal, Mayank, et al. "Scaling up psychology via scientific regret minimization." Proceedings of the National Academy of Sciences 117, no. 16 (2020): 8825-8835. https://www.pnas.org/doi/10.1073/pnas.1915841117
Discussion lead: Myra Cheng
Week 5, Feb 16th
Topic: Common ground and communication
Papers:
Herbert Clark’s “Using Language” (Chapter 4) Chapter 4.pdf
Contributing to Discourse. https://onlinelibrary.wiley.com/doi/pdfdirect/10.1207/s15516709cog1302_7
Grounding as a Collaborative Process. https://aclanthology.org/2021.eacl-main.41.pdf
Optional reading:
Communities, Commonalities, and Communication
https://www.sol.lu.se/media/utbildning/dokument/kurser/LINB30/20082/Clark.pdf
Shared common ground influences information density in microblog texts
Discussion lead: Omar Shaikh
Week 6, Feb 23rd
Topic: Human-centered data and language models: Privacy, data as labor, and licensing
Papers:
Contractor, Danish, et al. "Behavioral use licensing for responsible AI." 2022 ACM Conference on Fairness, Accountability, and Transparency. 2022. https://dl.acm.org/doi/abs/10.1145/3531146.3533143
The Paradox of Reuse, Language Models Edition. https://nmvg.mataroa.blog/blog/the-paradox-of-reuse-language-models-edition/
ChatGPT Stole Your Work. So What Are You Going to Do? https://www.wired.com/story/chatgpt-generative-artificial-intelligence-regulation/
Guest: Nick Vincent
Week 7, Mar 2nd
Topic: Agreeability among humans
Papers:
Bakker, Michiel A., et al. "Fine-tuning language models to find agreement among humans with diverse preferences." https://www.deepmind.com/publications/fine-tuning-language-models-to-find-agreement-among-humans-with-diverse-preferences
Argyle, Lisa P., et al. "AI Chat Assistants can Improve Conversations about Divisive Topics." https://arxiv.org/pdf/2302.07268.pdf
Discussion lead: Nicole Meister
Week 8, Mar 9th
Topic: Simulating humans with LLMs: Challenges and opportunities for social science research
Papers:
Argyle, Lisa P., et al. "Out of one, many: Using language models to simulate human samples." Political Analysis (2022): 1-15. https://arxiv.org/pdf/2209.06899.pdf
Aher, Gati, Rosa I. Arriaga, and Adam Tauman Kalai. "Using Large Language Models to Simulate Multiple Humans." https://arxiv.org/pdf/2208.10264.pdf
I strongly feel that this is an insult to life itself, Kevin Munger. https://kevinmunger.substack.com/p/i-strongly-feel-that-this-is-an-insult
Optional: Kosinski, Michal. "Theory of mind may have spontaneously emerged in large language models." arXiv preprint arXiv:2302.02083 (2023). https://arxiv.org/pdf/2302.02083.pdf
Discussion lead: Tiziano Piccardi
Week 9, Mar 16th
Topic: Race and racism
Papers:
Voigt, Rob, et al. "Language from police body camera footage shows racial disparities in officer respect." Proceedings of the National Academy of Sciences 114.25 (2017): 6521-6526. https://www.pnas.org/doi/10.1073/pnas.1702413114
Prabhakaran, Vinodkumar, et al. "Detecting institutional dialog acts in police traffic stops." Transactions of the Association for Computational Linguistics 6 (2018): 467-481. https://transacl.org/ojs/index.php/tacl/article/view/1349
Camp, Nicholas P., et al. "The thin blue waveform: Racial disparities in officer prosody undermine institutional trust in the police." Journal of personality and social psychology 121.6 (2021): 1157. https://www.apa.org/pubs/journals/releases/psp-pspa0000270.pdf
Epp, Charles R., Steven Maynard‐Moody, and Donald Haider Markel. "Beyond profiling: The institutional sources of racial disparities in policing." Public Administration Review 77.2 (2017): 168-178. https://onlinelibrary.wiley.com/doi/full/10.1111/puar.12702
Discussion lead: Anjalie Field