Toxicity in ChatGPT: Analyzing Persona-assigned Language Models
Ameet Deshpande*, Vishvak Murahari*, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan
Princeton University, Allen Institute for Artificial Intelligence, Georgia Institute of Technology
Summary
With the advent of powerful LLMs that are being personalized by companies for specific use cases, our team of researchers was interested in understanding how the behavior of the model depends on how it is customized. Specifically, we examined the affect of ChatGPT when it is assigned different personas using the system parameter. We found that the toxicity of ChatGPT highly depended on the persona it was assigned (up to 6x increase) and that it portrayed patterns of algorithmic discrimination towards certain demographics.
Our work was inspired by the blueprint for AI bill of rights that was released in October 2022, and we note that many LLMs released post that violate multiple provisions penned in it.
Our study had an impact on OpenAI's focus on personalization as a problem that needs to be studies carefully. We detail the timeline of events below.
Timeline of events
October 2022
Blueprint for an AI Bill of Rights is released by the White House. It details several provisions such as (a) Safe and Effective Systems and (b) Algorithmic Discrimination Protections which AI systems should follow.
November 2022
OpenAI releases ChatGPT.
December 2022
Our team of researchers begin analyzing the blueprint and testing compliance of LLMs.
March 2023
OpenAI makes ChatGPT API generally accesible, with multiple companies adopting it in production swiftly.
April 2023
Our team releases the first comprehensive toxicity analysis of ChatGPT by analyzing over half a million generations. Our findings show the perils of persona-assigned language models, which is a common practice while deploying these models. Our analysis points out that ChatGPT does not comply with the provisions in the blueprint.
April 2023
TechCrunch and VentureBeat feature our analysis and highlight the issue with companies blindly using ChatGPT API without any prior safety analysis.
April 2023
Several media houses like MarkTechPost, Mashable, Tech Xplore, Global Business Leaders Magazine, Gigazine, and Tech Times follow suit.
May 2023
CNBC features our article and highlights how layoffs have impacted toxicity research in industry.
May 2023
As a swift and direct impact of our research, OpenAI recognizes "personalization" as one of the key areas it is seeking inputs for governance rules.
June 2023
TechCrunch recognizes our work as a key milestone in the continued development of LLMs.