Laura Weidinger - Publications

Selected Research

My academic work has focused on taxonomising ethical and social harms from generative AI; and on finding valid and tractable ways to assess risks of harm ahead of time. Some highlights are listed below, else see a full list of my publications on Google Scholar.

I've also contributed to DeepMind flagship models Gopher, Sparrow, YouTutor and Gemini.

Taxonomising Harm from AI

Ethical and social risks of harm from Language Models (pre-print, 2021).
- Peer-reviewed, abbreviated version: Weidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P.S., Mellor, J., Glaese, A., Cheng, M., Balle, B., Kasirzadeh, A., Biles, C., Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, Iason Gabriel. 2022, June. Taxonomy of Risks posed by Language Models (FAccT 2022).

Evaluating Risks of Harm

Sociotechnical Safety Evaluation of Generative AI Systems (pre-print, 2023)
- Peer-reviewed, abbreviated version: Rauh, M., Marchal, N., Manzini, A., Hendricks, L.A., Mateos-Garcia, J., Bergman, S., Kay, J., Griffin, C., Bariach, B. Gabriel, I., and Weidinger, L. Gaps in the Safety Evaluation of Generative AI (AIES 2024).
Holistic Safety and Responsibility Evaluations of Advanced AI Models (pre-print, 2024)
- An overview of what a comprehensive approach to AI Safety Evaluation may look like.

Red Teaming

STAR: SocioTechnical Approach to Red Teaming Language Models (pre-print, 2024)
- Peer-reviewed, abbreviated version: STAR: SocioTechnical Approach to Red Teaming Language Models (EMNLP 2024)

Alignment

Using the Veil of Ignorance to align AI systems with principles of justice. Weidinger, L., McKee, K.R., Everett, R., Huang, S., Zhu, T.O., Chadwick, M.J., Summerfield, C. and Gabriel, I. Proceedings of the National Academy of Sciences, 120(18), p.e2213709120.