Prohibited Scenario refers to a real-world conversational context in which ChatGPT is forbidden from providing a meaningful output. OpenAI has listed all prohibited scenarios in the official usage policies.
In each prohibited scenario, ChatGPT warns users that the conversation potentially violates OpenAI policy.
For simplicity, we use 'scenario' to refer to such contexts throughout the paper.
This table illustrates descriptions and examples of OpenAI’s disallowed usages.
To evaluate the effectiveness of the jailbreak prompts in bypassing ChatGPT's security measures, we designed a series of experiments grounded in prohibited scenarios. This section outlines the generation process of these scenarios, which serves as the basis for our empirical study.
We derived eight distinct prohibited scenarios from OpenAI's disallowed usage policy, as illustrated in the table above. These scenarios represent potential risks and concerns associated with the use of ChatGPT. Given the absence of existing datasets covering these prohibited scenarios, we opted to create our own scenario dataset tailored to this specific purpose. To achieve this, the authors of this paper worked collaboratively to create question prompts for each of the eight prohibited scenarios. They collectively wrote five question prompts per scenario, ensuring a diverse representation of perspectives and nuances within each prohibited scenario. This can minimize the potential biases and subjectivity during the prompt generation process.
The final scenario dataset comprises 40 question prompts (8 scenarios x 5 prompts) that cover all prohibited scenarios outlined in OpenAI's disallowed usage policy. In subsequent sections, we discuss how we employed this scenario dataset and jailbreak prompt dataset to investigate the capability and robustness of jailbreak prompts to bypass ChatGPT.