Examines security concerns of in-context learning (ICL) with large language models (LLMs) like ChatGPT and GPT-4 from an adversarial perspective.
Introduces a novel attack method, advICL, that manipulates demonstrations without altering input to deceive LLMs.
Demonstrates that increased demonstrations reduce the robustness of ICL.
Identifies the intrinsic property of demonstrations being reusable with different inputs, posing practical security threats.
Proposes TransferableadvICL, a transferable version of advICL, effective in attacking unseen test inputs.
Highlights critical security risks in ICL and the need for extensive research on its robustness, especially given the growing importance of LLMs.