Zhiyuan Yu, Ao Li, Ruoyao Wen, Yijia Chen, Ning Zhang
Washington University in St. Louis
PhySense is designed to protect perception of autonomous systems.
Autonomous vehicles (AVs) empowered by deep neural networks (DNNs) are bringing transformative changes to our society. However, they are generally susceptible to adversarial attacks, especially physically realizable perturbations that can mislead perception and cause catastrophic outcomes. While existing defenses have shown success, there remains a pressing need for improved robustness while maintaining efficiency to meet real-time system operations.
To tackle these challenges, we introduce PhySense, a complementary solution that leverages multi-faceted reasoning for misclassification detection and correction. This defense is built on physical characteristics, including static and dynamic object attributes and their interrelations. To effectively integrate these diverse sources, we develop a system based on the conditional random field that models objects and relationships as a spatial-temporal graph for holistic reasoning on the perceived scene. To ensure the defense does not violate the timing requirement of the real-time cyber-physical control loop, we profile the run-time characteristics of the workloads to parallelize and pipeline the execution of the defense implementation. The efficacy of PhySense is experimentally validated through simulations of datasets and real-world driving tests. It also demonstrates resiliency against adaptive attacks, and the potential of applying underlying principles to other modalities beyond vision.
The paper has been accepted by the 31st ACM Conference on Computer and Communications Security (CCS), 14-18 October 2024.
Physical world adversarial patterns can cause AVs to malfunction.
Adversarial examples can manifest in diverse forms in the physical world, disrupting the object tracking and therefore decision making process of AVs.
Monitor Display
Light Projection
Paper Patch
Defense Motivation: Human perception is much more robust than machine learning algorithms...
From the view of machine learning models:
Trained on individual objects and labels
Rely on non-explainable latent features
Individually recognize objects within isolated ROIs
Susceptible to perturbations 😈
From the view of human perception:
Perceive the entire scene for better understanding
Take advantage of high-level physical features
Correlate object relationships and interactions in a holistic view
Less affected by restricted (e.g., Lp-bounded) perturbations! 😇
Can we defend by enhancing AVs' capabilities of scene understanding?
Combining statistical modeling, robust physical rules, and pipelining techniques.
Why do we need to build a kinematic model in 3D?
Physical laws are 3D by nature
Need to pull all objects within one space to
Characterize them using universal measurements
Reason about their correlations and interactions
Now we have basic information for individual objects (3D space occupation, 3D locations, 3D velocities, 3D accelerations, etc.)
What's next?
Inherent Attributes
Texture, Physical dimensions, etc.
Approach: Bayes' rules that maps to object class probabilities
Behaviors
e.g., A car turning right
Approach: Interactive human annotation & Attention-BiLSTM model for recognition
Interactions
e.g., A car trailing the preceding vehicle
Approach: Rule-based identification for any given object pairs
Challenge: Existing datasets lack annotated behaviors in the context of transportation.
Approach:
Built on existing datasets (e.g., nuScenes).
We developed interactive HTML files for visualization and human inspection.
We created human annotations via a structured thematic coding process.
🔥 Our interactive HTML files and human annotations are available to facilitate future research in this field!
Challenge: How do we identify diverse behaviors across various objects?
Approach:
Object behaviors are represented in time-series 3D locations.
We developed an Attention-BiLSTM model for behavior identification.
Trained on our annotated object behaviors.
🔥 Our pre-trained model checkpoints are available for download!
Challenge: How do we formulate holistic reasoning for scenes with intercorrelated objects?
Approach:
Conditional Random Field (CRF) as the base structure.
Construct a graph for a scene - individual objects as nodes, and interactions as edges.
Develop an adapted energy function - inherent attributes & behaviors as unary terms, and interactions as binary terms.
Learnable weight matrices as factors for different types of physical characteristics
Run time breakdown of the initial implementation
Our pipelining of individual processing stages
Challenge: A naive implementation can lead to high overhead, especially when tasks are executed sequentially.
Approach:
Understand performance bottleneck via task offline profiling → Computational delays are mostly CPU-bound.
Dispatch fine-grained tasks by a thread pool; memory buffers are pre-allocated to reduce the latency from runtime memory management.
Physical consistency among consecutive frames → determine the optimal number of parallelized threads.
Pipelining - Upon completion of a stage, the data required for the subsequent stage is updated via a buffer implemented using a FIFO (first-in, first-out) queue.
PhySense was evaluated using datasets, simulations, and real-world driving tests.
The measurement metrics focus on detection accuracy, correction accuracy, false positive rate (FPR), false negative rate (FNR), and run time.
Evaluation on Datasets - nuScenes, KITTI, Custom dataset collected from Carla
Over 99% detection accuracy, average 98% correction accuracy across datasets
Real-world Driving Tests - 8.6 hours of driving tests with a Tesla Model 3
Similar results to dataset-based evaluation, slight performance drop due to occlusions
Control Deviation - Measure the repeated travel trajectories with and without PhySense
The average deviation between these two trajectories was 0.0266𝑚 for the x-axis and 0.0251𝑚 for the y-axis.
For more details please see our paper. The results are reproducible with our released code and datasets.
Main evaluation results
AV's trajectorties with and without PhySense
If you find this work helpful, please cite us at:
@inproceedings{yu2024physense,
title={PhySense: Defending Physically Realizable Attacks for Autonomous Systems via Consistency Reasoning},
author={Yu, Zhiyuan and Li, Ao and Wen, Ruoyao and Chen, Yijia and Zhang, Ning},
booktitle={Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security},
year={2024}
}
Our paper is available at: