Taxonomy of Jailbreak Prompt Patterns

Here we introduce the taxonomy of jailbreak prompt patterns.

Pretending

This type of prompts try to alter the conversation background or context while maintaining the same intention. For instance, a pretending prompt may engage ChatGPT in a role-playing game, thereby transforming the conversation context from a direct question-and-answer scenario to a game environment. However, the intention of the prompt remains the same, which is to obtain an answer to a prohibited question. Throughout the conversation, the model is aware that it is being asked to answer the question within the game's context.

Attention Shifting

This type of prompts aim to change both the conversation context and intention. For example, one typical attention-shifting pattern is text continuation. In this scenario, the attacker diverts the model's attention from a question-and-answer scenario to a story-generation task. Additionally, the intention of the prompt shifts from asking the model questions to making it construct a paragraph of text. The model may be unaware that it could implicitly reveal prohibited answers when generating responses to this prompt.

Privilege Escalation

This is a distinct category of prompts that seek to directly circumvent the imposed restrictions. In contrast to the previous categories, these prompts attempt to induce the model to break any of the restrictions in place, rather than bypassing them. Once the attackers have elevated their privilege level, they can ask the prohibited question and obtain the answer without further impediment.

Pattern Description

The table above introduces taxonomy of jailbreak prompts.

Jailbreak Prompt Data

jailbreak-prompt.xlsx

Page updated

Google Sites

Report abuse