Mitigating Societal Harms of Language Models

Tutorial @ The WebConf 2022 | Online | Date and Time: April 26, 2022, 3pm CEST

Abstract

Recent advancement in large language models have led to remarkable improvements in the capabilities of Natural Language Processing (NLP) models and resulted in an increasing adoption of language technologies in user-facing products and critical applications. However, several recent studies have highlighted their potential harms on people and society and proposed different solutions to mitigate them. With increasing deployment of model driven NLP tools, there is an urgent need to equip researchers and practitioners on knowledge about such social harms and methods and techniques to control models from adversely impacting people. However, this body of work is so far lacking a common framework and methodology. This tutorial aims to fill this gap.

Tentative Outline:

Brief Introduction to Language models (5 mins)
Possible Harms of Language Technologies (15 mins)
- Fairness/Bias - Research on human-like biases in NLP.
- Toxicity - Research on toxic text generated by NLP models and biases propagated in efforts to correct them.
- Misinformation, Factual Inconsistencies - factual errors in generated text.
- Privacy - Models generating sensitive, identifying information about individuals like addresses, SSN, etc. \cite{carlini2020extracting, inan2021training}
Evaluation and Detection (35 mins)
- Detect problematic usage - Toxic text detection, fact-checking, hallucination detection, bias-detection.
- Visualizations of outputs - visualizing machine-generated text, gender bias in news.
- Analysis methods - analyzing biased language, analyzing factual errors in generation, probing for toxic language.
- Human-in-the-loop methods - frameworks to include humans in identifying harmful content \cite{fanton2021human}
Mitigating Social Harms (35 mins)
- Data level interventions - Methods to filter harmful data.
- Model level interventions - Model constraints for eliminating harmful outputs.
- Post-Processing interventions - Methods to fix problematic outputs post prediction.

Speakers:

Sachin Kumar (Ph.D. Student at Carnegie Mellon University)

Sachin is a fifth-year Ph.D. student at the Language Technologies Institute, School of Computer Science at CMU, working in the intersection of machine learning and language technologies. Sachin's research tackles critical technical problems in core language generation with deep learning, such as open-vocabulary generation, detection and demotion of spurious confounders, and controllable generation.

Vidhisha Balachandran (Ph.D. Student at Carnegie Mellon University)

Vidhisha is a third-year Ph.D. student at the Language Technologies Institute, School of Computer Science at Carnegie Mellon University. Her current research focuses on building interpretable and reliable NLP models with a focus on summarization, factuality, and KB-based reasoning.

Yulia Tsvetkov (Assistant Professor at Univesity of Washington, Seattle)

Yulia is an assistant professor at the Paul G. Allen School of Computer Science & Engineering at the University of Washington. Her research group works on trustworthy and ethical NLP, multilingual NLP, and language generation. The projects are motivated by a unified goal: to extend the capabilities of human language technology beyond individual populations and across language boundaries, thereby enabling NLP for diverse and disadvantaged users, the users that need it most. Yulia co-organized several workshops and tutorials, including the Workshop on Multilingual and Cross-lingual Methods in NLP in 2016, Workshop on Subword and Character Level Models in NLP in 2018, SafeConvAI @ SIGDial 2021: A Special Session on Safety for E2E Conversational AI, and tutorials on Socially Responsible NLP at NAACL 2018 and TheWebConf 2019. Prior to joining UW, Yulia was an assistant professor at Carnegie Mellon University and a postdoc at Stanford. Yulia is a recipient of the Okawa research award, Amazon machine learning research award, Google faculty research award, and multiple NSF awards.

Page updated

Google Sites

Report abuse