Embodied AI Attack

Summary

Decision-making is crucial for embodied AI systems like intelligent vehicles and robots, enabling effective planning and action in physical environments. Recently, deep learning has gained attention for decision-making tasks, in addition to its widespread use in perception and prediction. However, challenges remain in handling open-world corner cases and achieving generalized performance.

Large language models (LLMs), with strong generalization and reasoning abilities from massive data, offer promising solutions. Recent works have built LLM-based systems that generate decisions and behavior plans for physical agents from textual descriptions. These systems must accurately respond to environmental inputs and goals. Yet, generic LLMs often struggle with complex, domain-specific tasks. Fine-tuning and retrieval-augmented generation (RAG) are used to adapt LLMs efficiently, avoiding costly training from scratch and enabling domain-specific reasoning.

Nonetheless, LLMs are vulnerable to attacks like jailbreaking and in-context backdoors, posing physical risks in embodied applications. Existing research doesn’t fully address threats from fine-tuning, RAG, and real-world grounding, which create new attack surfaces.

We propose BALD, a comprehensive Backdoor Attack framework against LLM-based Decision-making in embodied AI. BALD explores: (1) Word injection – embedding triggers in prompts, (2) Scenario manipulation – altering physical environments to trigger malicious behavior, and (3) Knowledge injection – placing triggers in RAG databases. Tested on GPT-3.5, LLaMA-2, PaLM, and platforms like HighwayEnv, nuScenes, and VirtualHome, BALD causes behaviors like incorrect lane changes, crashes, and placing a knife on a bed. Our attacks reach near 100% success for word and knowledge injection with minimal performance loss and over 65% success in scenario manipulation.

Attack Overview

Overview of our proposed BALD (Backdoor Attacks against LLM-enabled Decision-making systems) framework. We propose three distinct attack mechanisms: word injection, scenario manipulation, and knowledge injection, with each targeting different stages of the representative abstraction of the LLM-based decision-making system pipeline.

Attack Methodology

BALD-word: Word Injection Attack

We treat LLMs as a one-step optimizer to select the trigger words. The proposed strategies can: (1) maximize the attack effectiveness; (2) minimize the performance degradation on benign inputs; and (3) minimize the influence on benign LLMs

BALD-scene: Scenario Manipulation Attack

We propose a novel scenario-based trigger mechanism, utilizing a high-level distinct semantic scenario or environment as the trigger. We propose strategies to maximize the attack effectiveness while minimizing the rate the trigger is falsely triggered by other scenarios

BALD-RAG: Knowledge Injection Backdoor Attacks

We propose a novel dual trigger for retrieval and attack.

Retrieval: The correct data with trigger words is retrieved when similar scenarios are encountered
Attack: The trigger words activate the attack

Selected Results and Takeaways

Motivation: Fine-tuning can largely increase the performance of LLMs on specific embodied tasks.
Existing Approach: Attacks on ICL are much less effective in the fine-tuned embodied agent.
Effectiveness: BALD-word and BALD-RAG can achieve nearly 100% attack success rate, while BALD-scene is slightly less effective. All have shown severe safety consequences.
Mitigation: Inference stage defense can hardly defend against our attacks.

Research Paper

[ICLR'25] Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems

Ruochen Jiao*, Shaoyuan Xie* (co-first author), Justin Yue, Takami Sato, Lixu Wang, Yixuan Wang, Qi Alfred Chen, Qi Zhu

The Thirteenth International Conference on Learning Representations (ICLR) 2025. Acceptance rate 32.08% = 3689/11500

[PDF] [Code]

BibTex for citation:

@inproceedings{

jiao2025can,

title={Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied {LLM}-Based Decision-Making Systems},

author={Ruochen Jiao and Shaoyuan Xie and Justin Yue and TAKAMI SATO and Lixu Wang and Yixuan Wang and Qi Alfred Chen and Qi Zhu},

booktitle={International Conference on Learning Representations (ICLR)},

year={2025}

}

Team

Ruochen Jiao, Ph.D. student, Northwestern University

Shaoyuan Xie, Ph.D. student, University of California, Irvine

Justin Yue, Undergraduate student, University of California, Irvine

Takami Sato, Ph.D. student, University of California, Irvine

Lixu Wang, Ph.D. student, Northwestern University

Yixuan Wang , Ph.D. student, Northwestern University

Qi Alfred Chen, Assistant Professor, University of California, Irvine

Qi Zhu, Professor, Northwestern University

Acknowledgments

This research was supported by

NSF under grants CNS-2145493;
USDOT under Grant 69A3552348327 for the CARMEN+ University Transportation Center.

Google Sites

Report abuse