RSS Workshop on

Statistical Uncertainty Quantification in the Era of AI-Enabled Robots

June 25th, 2025 (Los Angeles, USA)

Motivation

Research on AI and embodied systems promises to enable many future technologies in application areas such as wildfire drones, assistive robots, and autonomous space exploration. Much progress has been made in the design of AI-enabled components in robotics, e.g., for perception, localization and state estimation, decision-making and control, and generative AI and LLMs. However, the integration of AI-enabled components is currently limited by their fragility, e.g., perception systems that are unreliable in off-nominal conditions, foundation models that are confidently wrong, and jailbroken LLM-controlled robots. This raises serious questions regarding the safety of AI-enabled robots.

We thus require analysis and design frameworks for AI-enabled components that equip robots with the ability to know when they don’t know so that they can seek help from humans, actively explore the environment to reduce uncertainty, and collect new data for learning. However, existing – often optimization-based – tools for the analysis of AI-enabled robots cannot deal with the complexity of the task at hand. The community has hence started exploring statistical uncertainty quantification techniques as a scalable alternative that apply to general classes of robots and systems. In fact, statistical uncertainty quantification techniques are applicable and scale to complex AI-enabled robots, come with probabilistic safety guarantees, and are easy to understand and extend. While these techniques have shown initial success, there are many open questions that we explore in this workshop (see “Discussion Topics”).

The expected audience for the workshop are theoreticians as well as practitioners broadly interested in AI and safety in robotics. The workshop is specifically intended for researchers from the robotics, formal methods, systems and control, machine learning and AI, and statistics communities, which will be well represented at RSS. We have selected presenters and panelists accordingly from these sub-communities to guarantee coverage of a diverse spectrum of topics and opinions.

In-Person and Remote Participation

The workshop will be hosted in the Olin Hall of Engineering in Room 120 at the University of Southern California, see https://roboticsconference.org/program/workshops/ for more details.

There will also be a remote zoom option:

https://usc.zoom.us/j/92556243873?pwd=8YHbIaxqnxS43wOnvNoOF4xrGg7Yxi.1

(Meeting ID: 925 5624 3873, Passcode: 387145)

Speakers and Panelists

Anastasios Angelopoulos

Ph.D. Candidate in Electrical Engineering and Computer Science

University of California, Bereley

Stephen Tu

Assistant Professor in Electrical and Computer Engineering

University of Southern California

Sayan Mitra

Professor in Electrical and Computer Engineering

University of Illinois at Urbana Champaign

Masha Itkina

Research Lead and Manager

Toyota Research Institute

Anqi Liu

Assistant Professor in Computer Science

Johns Hopkins University

Nikolay Atanasov

Associate Professor in Electrical and Computer Engineering

University of California, San Diego

Rohan Sinha

PhD candidate in Aeronautics and Astronautics

Stanford University

Matthew O'kelly

Research Scientist

Waymo

Topics and Tentative Schedule

What are existing uncertainty quantification (UQ) methods (e.g., Conformal Prediction, PAC-Bayes, … ). What are guidelines for when to use which method?
Where is the magic in UQ? Is it in theoretical tools that provide strong statistical assurances but are ultimately wrappers around heuristic measures of uncertainty? Or is it in domain-specific measures that capture task-relevant uncertainty?
Are formal assurances that come from UQ relevant in practice (e.g., to the industry participants)?
Does robotics require novel UQ methods, or do established methods from statistics and ML suffice?
How are UQ methods integrated into the design of robots? Do we require novel analysis and design techniques to do so?
How can UQ methods deal with distribution shifts in robotics?

Scheduled event

Initial Remarks

Anastasios Angelopoulos (University of California, Berkeley)

Stephen Tu (University of Southern California)

Coffee Break

Sayan Mitra (UIUC)

Masha Itkina (Toyota Research Institute)

Mingle + Interactive Session: Preparing Questions for Panel

Panel Discussion (with previous 4 speakers)

Lunch Break

Anqi Liu (JHU)

Nikolay Atanasov (University of California, San Diego)

Coffee Break

Rohan Sinha (Stanford)

Matthew O'kelly (Waymo)

Mingle + Interactive Session: Preparing Questions for Panel

Panel Discussion (with previous 4 speakers)

Workshop Summary and Future Workshops

Time

08:30 am to 08:40 am

08:40 am to 09:20 am

09:20 am to 10:00 am

10:00 am to 10:30 am

10:30 am to 11:10 am

11:10 am to 11:50 am

11:50 am to 12:05 pm

12:05 pm to 12:30 pm

12:30 pm to 02:00 pm

02:00 pm to 02:40 pm

02:40 pm to 03:20 pm

03:20 pm to 04:00 pm

04:00 pm to 04:40 pm

04:40 pm to 05:20 pm

05:20 pm to 05:45 pm

05:45 pm to 06:15 pm

06:15 pm to 06:30 pm

Talk Details

1. Anastasios Angelopoulos

Title: An Introduction to Conformal Prediction

Abstract: High-risk machine learning deployments demand rigorous uncertainty quantification certifying the safety of the prediction algorithm. This will be an introductory talk about conformal prediction, a way of constructing distribution-free ”confidence intervals” for black-box algorithms like neural networks. These intervals are guaranteed to contain the ground truth with high probability regardless of the underlying algorithm or dataset. The focus will be equally on reviewing foundational results and practical examples.

2. Stephen Tu

Title: Challenges in Learning Behavior Certificates from Data

Abstract: A major challenge in deploying autonomous systems in the real world is ensuring that they operate safely and predictably, even in novel environments not encountered during development. A promising approach is to certify system behavior using behavior certificates, which provide sufficient conditions for guaranteeing performance and safety. Recently, learning-based statistical methods have gained popularity as a computationally tractable means of constructing such certificates. In this talk, we will first explore the fundamental statistical limitations of these approaches. We will then discuss recent work that introduces additional structure through latent dynamics models as a promising strategy for overcoming these limitations.

3. Sayan Mitra

Title: Towards Verified Visual Autonomy: Perception Contracts and Abstract Rendering

Abstract: We address the challenge of verifying vision-based autonomous systems, where uncertainty in sensing and perception has long hindered formal guarantees. These autonomous systems combine modules for sensing, perception, decision-making, and control, and verifying their correctness requires tracking how sets of states propagate through the system while preserving certificates such as invariants or Lyapunov functions. While formal methods like reachability analysis can rigorously handle control and dynamics, perception components—especially image classification and rendering—have resisted formal treatment. We present perception contracts, a framework that combines formal and statistical reasoning to enable verification across sensing and perception layers. Applications in lane-keeping and autonomous landing demonstrate its effectiveness. We further introduce abstract rendering, a method for uncertainty propagation in neural scene representations (e.g., NeRFs, Gaussian splats) via compositional linear bound propagation. This approach enables end-to-end formal verification of vision-based tasks such as semantic classification, pose estimation, and visual control. Together, we believe that these advances pave the way for reliable, minimalistic, and certifiable visual autonomy.

4. Masha Itkina

Title: Evaluation and Uncertainty in the Age of Robot Learning

Abstract: Large-scale robot learning, colloquially known as Large Behavior Models (LBMs), Embodied Foundation Models (EFMs), or Vision-Language-Action (VLA) models, has become increasingly the norm in robot learning literature since the success of ChatGPT. Nevertheless, there are many questions that remain surrounding their development in the context of real-world, embodied systems. For example, we need rigorous statistical methodologies for evaluation and comparison of existing robot learning models. To deploy these models in human environments, they should be equipped with reliable failure detection systems despite the challenge of immeasurable failure types and conditions during deployment. Lastly, these models should have the capacity to explore and adapt to new environments, preferences, and tasks. In this talk, I will overview our work as part of the Trustworthy Learning under Uncertainty (TLU) effort at TRI along a few of these research directions, focussing on evaluation and failure detection.

5. Anqi Liu

Title: Robust and Uncertainty-Aware Decision Making under Distribution Shifts

Abstract: Decision making tasks like contextual bandit and reinforcement learning often need to be conducted under data distribution shifts. For example, we may need to utilize off-policy data to evaluate a target policy and/or learn an optimal policy utilizing logged data. We may need to deal with sim2real problem when there is a dynamics shift between training and testing environments. We may also need to account for feedback shifts induced by learning-based actions taken during decision making. In this talk, I am going to introduce three threads of my work in the domain of robust and uncertainty-aware decision making under distribution shifts. First, I will introduce distributionally robust off-policy evaluation and learning techniques that feature a more conservative uncertainty in the reward estimation. This pessimistic reward estimation will benefit both off-policy evaluation and learning under various distribution shifts. Second, I will introduce off-dynamics reinforcement learning via domain adaptation and reward augmented imitation. We recognize that the previous methods in off-dynamics reinforcement learning can suffer performance degradation and propose an imitation from observation approach to mitigate it. Finally, I will cover our recent work in uncertainty-aware approaches to safe decision-making problems: conformal neural bandits under stage-wise constraints, which utilize conformal prediction under feedback covariate shift for learning and exploration under constraints.

6. Nikolay Atanasov

Title: Particle-Based and Distributionally Robust Uncertainty Quantification for Safe Robot Control

Abstract: Recent progress in machine learning and foundation models has led to an explosion of learned robot models and behaviors. Robots rely on learned features for localization, learned environment representations for mapping, learned dynamics models for control, and reinforcement learning and vision-language-action models for task execution. This explosion of black box approximations raises serious questions about failure modes and safety guarantees. In the context of safe control synthesis, where safety constraints are subject to approximation uncertainty, this talk will explore distribution-based and distribution-free techniques for probabilistic uncertainty quantification. We will discuss uncertainty quantification using Gaussian Processes, particle flow filtering, and distributionally robust optimization and will present applications in autonomous robot navigation and arm motion.

7. Rohan Sinha

Title: Guardrails against failures: Towards trustworthy AI-driven autonomy with runtime monitors

Abstract: We will present a holistic perspective on the use of runtime monitors to detect and mitigate the negative consequences of out-of-distribution (OOD) scenarios -- i.e., scenarios dissimilar from training data -- on the safety and reliability of learning-based robotic systems. First, to detect impending failures, we show how the common-sense reasoning capabilities of foundation models, e.g., large language models (LLMs), make them attractive as runtime monitors. Second, we introduce a "thinking fast & slow" reasoning framework that enables the real-time use of LLMs to avert closed-loop failures caused by system-level deficiencies in a robot's semantic reasoning. Finally, besides averting immediate failures during deployment, we require mechanisms that inform offline development cycles to address observed failure modes. Towards this goal, we develop a precise understanding of how training data contributes to downstream outcomes – such as closed-loop task success or failure – based on the theory of influence functions. We then use our influence-based pipeline to curate training data, improving the performance and robustness of imitation-learned systems.

8. Matthew O'kelly

Title: TBA

Abstract: TBA

Organizers