PRSA: Prompt Stealing Attacks against Real-World Prompt Services
PRSA: Prompt Stealing Attacks against Real-World Prompt Services
Paper Overview
Recently, large language models (LLMs) have garnered widespread attention for their exceptional capabilities. Prompts are central to the functionality and performance of LLMs, making them highly valuable assets. The increasing reliance on high-quality prompts has driven significant growth in prompt services. However, this growth also expands the potential for prompt leakage, increasing the risk that attackers could replicate original functionalities, create competing products, and severely infringe on developers' intellectual property. Despite these risks, prompt leakage in real-world prompt services remains underexplored.
In this paper, we present PRSA, a practical attack framework designed for prompt stealing. PRSA infers the detailed intent of prompts through limited input-output analysis (even from a single input-output pair) and can successfully generate stolen prompts that replicate the original functionality. Extensive evaluations demonstrate PRSA's effectiveness across two main types of real-world prompt services. Specifically, compared to previous works, it improves the attack success rate from 17.8% to 46.1% in prompt marketplaces (with attack costs only 1.3%-12.3% of the original prompt price) and from 39% to 52% in LLM application stores, respectively. Notably, in the attack on "Math", one of the most popular educational applications in OpenAI's GPT Store with over 1 million conversations, PRSA uncovered a hidden Easter egg that had not been revealed previously. Besides, our analysis reveals that higher mutual information between a prompt and its output correlates with an increased risk of leakage. This insight guides the design and evaluation of two potential defenses against the security threats posed by PRSA and its future counterparts. We have reported these findings to the corresponding prompt service vendors, including PromptBase and OpenAI, and actively collaborate with them to implement defensive measures.
Figure 1. Overview of PRSA framework.
Attack Demos
Attack on Prompt Marketplaces: Stealing Prompts from PromptBase.
Please note that the prompt examples displayed below have been thoroughly modified to ensure the confidentiality of the original purchased prompt. Additionally, our academic presentation has been authorized by the prompt developer via email.
Real-Time Interactive Version.
We also anonymously provide a Real-Time Interactive Version (PRSA - a Hugging Face Space by prsa-prompt-stealing-attack) for verification purposes, implemented within the bounds of ethical considerations. This interactive version will be continuously updated.
Attack on LLM Application Stores: Stealing System Prompts in GPTs.
Considering the data privacy concerns associated with the aforementioned GPTs, we have not disclosed the system prompts obtained through PRSA. Instead, we demonstrate the effectiveness of PRSA attacks by creating corresponding pirated versions. These demo versions are strictly for academic research purposes and are not intended for commercial use.
Notice of Important Changes
Due to circumstances beyond our control, the developers of Math have recently updated the Math link.
(1) This change has affected Math’s ranking, with its position in the Education category falling from 3rd to 11th.
(2) Based on our latest stealing attacks evaluation, the trigger for the Easter egg in calculator mode has likely changed from 4 to 6. We've updated the pirated version accordingly.
We have confirmed this update with the Math developers via email, and they have explained the change. We have updated the paper accordingly.
Figure 2. Email response from Math developers regarding link update (shared with their consent).
Ethics and Disclosure
We promptly disclosed our findings and examples to the providers of the prompt services targeted in this paper via emails.
For attacks on prompt marketplaces, the examples were extensively modified and anonymized. These modifications and displays were authorized and supported by the original prompt owners.
For attacks on LLM application stores involving GPTs, we communicated our findings to the GPTs developers via email. The stolen system prompts obtained through PRSA were acknowledged by them. The demos of the pirated versions of these GPTs were authorized and supported by them. We also received positive response and support from OpenAI.
We declare that this work is solely intended for academic research purposes.
Acknowledgements: The website template was borrowed from DRA Jailbreak