Selected Research
My academic work has focused on taxonomising ethical and social harms from generative AI; and on finding valid and tractable ways to assess risks of harm ahead of time. Some highlights are listed below, else see a full list of my publications on Google Scholar.
I've also contributed to DeepMind flagship models Gopher, Sparrow, YouTutor and Gemini.
Taxonomising Harm from AI
Ethical and social risks of harm from Language Models (pre-print, 2021).
Peer-reviewed, abbreviated version: Weidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P.S., Mellor, J., Glaese, A., Cheng, M., Balle, B., Kasirzadeh, A., Biles, C., Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, Iason Gabriel. 2022, June. Taxonomy of Risks posed by Language Models (FAccT 2022).
Evaluating Risks of Harm
Sociotechnical Safety Evaluation of Generative AI Systems (pre-print, 2023)
Peer-reviewed, abbreviated version: Rauh, M., Marchal, N., Manzini, A., Hendricks, L.A., Mateos-Garcia, J., Bergman, S., Kay, J., Griffin, C., Bariach, B. Gabriel, I., and Weidinger, L. Gaps in the Safety Evaluation of Generative AI (AIES 2024).
Holistic Safety and Responsibility Evaluations of Advanced AI Models (pre-print, 2024)
An overview of what a comprehensive approach to AI Safety Evaluation may look like.
Red Teaming
STAR: SocioTechnical Approach to Red Teaming Language Models (pre-print, 2024)
Peer-reviewed, abbreviated version: STAR: SocioTechnical Approach to Red Teaming Language Models (EMNLP 2024)
Alignment
Using the Veil of Ignorance to align AI systems with principles of justice. Weidinger, L., McKee, K.R., Everett, R., Huang, S., Zhu, T.O., Chadwick, M.J., Summerfield, C. and Gabriel, I. Proceedings of the National Academy of Sciences, 120(18), p.e2213709120.