Rohan Sinha¹, Amine Elhafsi¹, Christopher Agia¹, Matthew Foutter¹, Edward Schmerling², Marco Pavone¹ ²
¹ Stanford University
² NVIDIA
Presented at Robotics: Science and Systems 2024.
Winner: Outstanding Paper Award
Abstract
Foundation models, e.g., large language models (LLMs), trained on internet-scale data possess zero-shot generalization capabilities that make them a promising technology towards detecting and mitigating out-of-distribution failure modes of robotic systems. Fully realizing this promise, however, poses two challenges: (i) mitigating the considerable computational expense of these models such that they may be applied online, and (ii) incorporating their judgement regarding potential anomalies into a safe control framework. In this work, we present a two-stage reasoning framework: First is a fast binary anomaly classifier that analyzes observations in an LLM embedding space, which may then trigger a slower fallback selection stage that utilizes the reasoning capabilities of generative LLMs. These stages correspond to branch points in a model predictive control strategy that maintains the joint feasibility of continuing along various fallback plans to account for the slow reasoner's latency as soon as an anomaly is detected, thus ensuring safety. We show that our fast anomaly classifier outperforms autoregressive reasoning with state-of-the-art GPT models, even when instantiated with relatively small language models. This enables our runtime monitor to improve the trustworthiness of dynamic robotic systems, such as quadrotors or autonomous vehicles, under resource and time constraints. Videos illustrating our approach in both simulation and real-world experiments are available on this project page: https://sites.google.com/view/aesop-llm.
Q: How can we enable real-time reactive reasoning with LLMs for dynamic robotic systems?
A: By reacting fast with LLM embeddings and reasoning slow with autoregressive generation!
Proposed Approach: AESOP
Offline: Construct a cache of language-based embedding vectors of the robot's prior nominal experiences.
(Online) FAST Reasoner: Detect anomalies with respect to the prior experience cache by computing cosine embedding similarity.
(Online) SLOW Reasoner: Reason about the safety consequences of an anomaly and appropriate safety-preserving interventions using chain-of-thought autoregressive generation.
(Online) Real-Time Control: Control the robot in real-time with an MPC that maximizes nominal performance while 1) maintaining several safety-preserving recovery plans and 2) accounting for the slow reasoner's inference latency.
Anomaly Detection with LLM Embeddings
Takeaway #1: Grounding the runtime monitor in prior robot experiences via embeddings outperforms generative reasoning with e.g., GPT-4 for anomaly detection.
Takeaway #2: Smaller models can do just as well as larger models!
Takeaway #3: Embedding-based anomaly detectors purely detect differences w.r.t. prior experiences, and do not directly predict failure.
Text-based models first use open vocabulary object detector to generate scene descriptions, then apply LLM.
Vision-based models directly operate on image observations.
Takeaway: Vision2text pipeline performs strongest in semantic anomaly detection, multi-modal encoders are exciting for future work!
The necessity of the FAST and SLOW reasoning
We require both the
speed of fast reasoner for real-time control
quality of CoT reasoning with large SoTA models for safety assessment & decision making.
Hardware Quadrotor Experiments
Overview:
Goal: Land on the red box
Recovery: Land on blue box, hover in holding zone
While ignoring inconsequential clutter
Simulated Quadrotor Experiments
Quantitative Planner Evaluation:
Naive MPC does not ensure fallback plans are feasible, only makes fallback plan post-hoc upon LLM output
FS-MPC plans fallback trajectories, but does not account for LLM latency
AESOP (Ours) plans fallbacks and accounts for LLM latency
Takeaway: Accounting for LLM latencies is a must!
@inproceedings{SinhaElhafsiEtAl2024,
title = {Real-Time Anomaly Detection and Reactive Planning with Large Language Models},
author = {Rohan Sinha and Amine Elhafsi and Christopher Agia and Matthew Foutter and Ed Schmerling and Marco Pavone},
year = 2024,
booktitle = {Robotics: Science and Systems}
}
Acknowledgements: The authors would like to thank Brian Ichter and Fei Xia for insightful discussions and feedback throughout the project. In addition, the authors are indebted to Jun En Low, Keiko Nagami, and Alvin Sun for their assistance in setting up the hardware experiments. The NASA University Leadership initiative (grant #80NSSC20M0163) and the Toyota Research Institute (TRI) provided funds to assist the authors with their research, but this article solely reflects the opinions and conclusions of its authors and not any NASA or TRI entity.