Toxicity in ChatGPT: Analyzing Persona-assigned Language Models

Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan

Princeton University, Allen Institute for Artificial Intelligence, Georgia Institute of Technology

Link to article

Summary

With the advent of powerful LLMs that are being personalized by companies for specific use cases, our team of researchers was interested in understanding how the behavior of the model depends on how it is customized. Specifically, we examined the affect of ChatGPT when it is assigned different personas using the system parameter. We found that the toxicity of ChatGPT highly depended on the persona it was assigned (up to 6x increase) and that it portrayed patterns of algorithmic discrimination towards certain demographics.

Our work was inspired by the blueprint for AI bill of rights that was released in October 2022, and we note that many LLMs released post that violate multiple provisions penned in it.

Our study had an impact on OpenAI's focus on personalization as a problem that needs to be studies carefully. We detail the timeline of events below.

Timeline of events

October 2022

Blueprint for an AI Bill of Rights is released by the White House. It details several provisions such as (a) Safe and Effective Systems and (b) Algorithmic Discrimination Protections which AI systems should follow.

November 2022

OpenAI releases ChatGPT.

December 2022

Our team of researchers begin analyzing the blueprint and testing compliance of LLMs.

March 2023

OpenAI makes ChatGPT API generally accesible, with multiple companies adopting it in production swiftly.

April 2023

Our team releases the first comprehensive toxicity analysis of ChatGPT by analyzing over half a million generations. Our findings show the perils of persona-assigned language models, which is a common practice while deploying these models. Our analysis points out that ChatGPT does not comply with the provisions in the blueprint.

April 2023

TechCrunch and VentureBeat feature our analysis and highlight the issue with companies blindly using ChatGPT API without any prior safety analysis.

April 2023

Several media houses like MarkTechPost, Mashable, Tech Xplore, Global Business Leaders Magazine, Gigazine, and Tech Times follow suit.

May 2023

CNBC features our article and highlights how layoffs have impacted toxicity research in industry.

May 2023

As a swift and direct impact of our research, OpenAI recognizes "personalization" as one of the key areas it is seeking inputs for governance rules.

June 2023

TechCrunch recognizes our work as a key milestone in the continued development of LLMs.

Page updated

Report abuse

Toxicity in ChatGPT: Analyzing Persona-assigned Language Models

Ameet Deshpande*, Vishvak Murahari*, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan

Summary

Timeline of events

Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan