PhySense: Defending Physically Realizable Attacks for Autonomous Systems via Consistency Reasoning

Zhiyuan Yu, Ao Li, Ruoyao Wen, Yijia Chen, Ning Zhang

Washington University in St. Louis

Overview

Why Do We Need PhySense?

How Does PhySense Work?

Workflow of the Defense-in-depth System

(I). 3D Kinematic Model

(II). Characterizing Objects via Physical Features

(III). Reasoning on the Perceived Scene

(IV). Improving Efficiency via Parallelization and Pipelining

How Well Does PhySense Perform?

Overview

PhySense is designed to protect perception of autonomous systems.

Autonomous vehicles (AVs) empowered by deep neural networks (DNNs) are bringing transformative changes to our society. However, they are generally susceptible to adversarial attacks, especially physically realizable perturbations that can mislead perception and cause catastrophic outcomes. While existing defenses have shown success, there remains a pressing need for improved robustness while maintaining efficiency to meet real-time system operations.

To tackle these challenges, we introduce PhySense, a complementary solution that leverages multi-faceted reasoning for misclassification detection and correction. This defense is built on physical characteristics, including static and dynamic object attributes and their interrelations. To effectively integrate these diverse sources, we develop a system based on the conditional random field that models objects and relationships as a spatial-temporal graph for holistic reasoning on the perceived scene. To ensure the defense does not violate the timing requirement of the real-time cyber-physical control loop, we profile the run-time characteristics of the workloads to parallelize and pipeline the execution of the defense implementation. The efficacy of PhySense is experimentally validated through simulations of datasets and real-world driving tests. It also demonstrates resiliency against adaptive attacks, and the potential of applying underlying principles to other modalities beyond vision.

The paper has been accepted by the 31st ACM Conference on Computer and Communications Security (CCS), 14-18 October 2024.

Why Do We Need PhySense?

Physical world adversarial patterns can cause AVs to malfunction.

Adversarial examples can manifest in diverse forms in the physical world, disrupting the object tracking and therefore decision making process of AVs.

Monitor Display

Light Projection

Paper Patch

Defense Motivation: Human perception is much more robust than machine learning algorithms...

From the view of machine learning models:

Trained on individual objects and labels
Rely on non-explainable latent features
Individually recognize objects within isolated ROIs
Susceptible to perturbations 😈

From the view of human perception:

Perceive the entire scene for better understanding
Take advantage of high-level physical features
Correlate object relationships and interactions in a holistic view
Less affected by restricted (e.g., Lp-bounded) perturbations! 😇

Can we defend by enhancing AVs' capabilities of scene understanding?

How Does PhySense Work?

Combining statistical modeling, robust physical rules, and pipelining techniques.

Workflow of the Defense-in-depth System

(I). 3D Kinematic Model

Why do we need to build a kinematic model in 3D?

Physical laws are 3D by nature

Need to pull all objects within one space to
1. Characterize them using universal measurements
2. Reason about their correlations and interactions

Now we have basic information for individual objects (3D space occupation, 3D locations, 3D velocities, 3D accelerations, etc.)

What's next?

(II). Characterizing Objects via Physical Features

Inherent Attributes

Texture, Physical dimensions, etc.

Approach: Bayes' rules that maps to object class probabilities

Behaviors

e.g., A car turning right

Approach: Interactive human annotation & Attention-BiLSTM model for recognition

Interactions

e.g., A car trailing the preceding vehicle

Approach: Rule-based identification for any given object pairs

Behavior Recognition

Challenge: Existing datasets lack annotated behaviors in the context of transportation.

Approach:

Built on existing datasets (e.g., nuScenes).
We developed interactive HTML files for visualization and human inspection.
We created human annotations via a structured thematic coding process.

🔥 Our interactive HTML files and human annotations are available to facilitate future research in this field!

Challenge: How do we identify diverse behaviors across various objects?

Approach:

Object behaviors are represented in time-series 3D locations.
We developed an Attention-BiLSTM model for behavior identification.
Trained on our annotated object behaviors.

🔥 Our pre-trained model checkpoints are available for download!

(III). Reasoning on the Perceived Scene

Challenge: How do we formulate holistic reasoning for scenes with intercorrelated objects?

Approach:

Conditional Random Field (CRF) as the base structure.
Construct a graph for a scene - individual objects as nodes, and interactions as edges.
Develop an adapted energy function - inherent attributes & behaviors as unary terms, and interactions as binary terms.
Learnable weight matrices as factors for different types of physical characteristics

(IV). Improving Efficiency via Parallelization and Pipelining

Run time breakdown of the initial implementation

Our pipelining of individual processing stages

Challenge: A naive implementation can lead to high overhead, especially when tasks are executed sequentially.

Approach:

Understand performance bottleneck via task offline profiling → Computational delays are mostly CPU-bound.
Dispatch fine-grained tasks by a thread pool; memory buffers are pre-allocated to reduce the latency from runtime memory management.
Physical consistency among consecutive frames → determine the optimal number of parallelized threads.
Pipelining - Upon completion of a stage, the data required for the subsequent stage is updated via a buffer implemented using a FIFO (first-in, first-out) queue.

How Well Does PhySense Perform?

PhySense was evaluated using datasets, simulations, and real-world driving tests.

The measurement metrics focus on detection accuracy, correction accuracy, false positive rate (FPR), false negative rate (FNR), and run time.

Evaluation on Datasets - nuScenes, KITTI, Custom dataset collected from Carla
- Over 99% detection accuracy, average 98% correction accuracy across datasets

Real-world Driving Tests - 8.6 hours of driving tests with a Tesla Model 3
- Similar results to dataset-based evaluation, slight performance drop due to occlusions

Control Deviation - Measure the repeated travel trajectories with and without PhySense
- The average deviation between these two trajectories was 0.0266𝑚 for the x-axis and 0.0251𝑚 for the y-axis.

For more details please see our paper. The results are reproducible with our released code and datasets.

Main evaluation results

AV's trajectorties with and without PhySense

If you find this work helpful, please cite us at:

@inproceedings{yu2024physense,

title={PhySense: Defending Physically Realizable Attacks for Autonomous Systems via Consistency Reasoning},

author={Yu, Zhiyuan and Li, Ao and Wen, Ruoyao and Chen, Yijia and Zhang, Ning},

booktitle={Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security},

year={2024}

}

Our source code is available at:

Code

Our dataset is available at:

Dataset

Our paper is available at:

Paper

Questions？

More details please see our paper to appear on ACM CCS 2024, or contact us.

Page updated

Google Sites

Report abuse