Invisible for both Camera and LiDAR: Security of Multi-Sensor Fusion based Perception in Autonomous Driving Under Physical-World Attacks
(IEEE S&P'21)
Summary
Today, various companies are developing self-driving cars, e.g., Level-4 Autonomous Driving (AD) vehicles. Some of them (e.g., Google Waymo, TuSimple, Baidu Apollo) are already providing services such as self-driving taxi and trucks on public roads. To ensure correct and safe driving, a fundamental pillar in the Autonomous Driving (AD) system is perception, which leverages sensors such as cameras and LiDARs (Light Detection and Ranging) to detect surrounding obstacles in the real time.
Various prior works have studied the security of AD perception, but so far all of them are limited to attacks on a single perception source, i.e., camera or LiDAR alone. However, production high-level AD systems such as Waymo typically adopt a Multi-Sensor Fusion (MSF) based design to achieve overall higher accuracy and robustness. In such a design, under the assumption that not all perception sources are (or can be) attacked simultaneously, there always exists a possible MSF algorithm that can rely on the unattacked source(s) to detect/prevent such an attack. This basic security design assumption is believed to hold in general, and thus MSF is widely recognized as a general defense strategy for AD perception.
In this work, we perform the first security analysis on MSF-based perception in AD systems today. We directly challenge the above basic security design assumption by demonstrating the possibility of effectively & simultaneously attacking all perception sources used in state-of-the-art MSF-based AD perception, i.e., camera and LiDAR. This for the first time allows to gain a concrete understanding of how much security guarantee the use of MSF can fundamentally provide as a general defense strategy for AD perception.
Novel Attack Design: MSF-ADV
Attack goal: fundamentally defeat MSF design assumption. We target an attack goal with a direct safety impact: fool a victim AD vehicle to fail in detecting a front obstacle and thus crash into it. We aim at effectively attacking all perception sources used in state-of-the-art MSF, i.e., both camera and LiDAR, to fundamentally defeat the MSF design assumption above.
Attack vector: Adversarial 3D objects. We discover that adversarial 3D objects can be used as a physically-realizable and stealthy attack vector for MSF-based AD perception. Our key observation is that different shapes of a 3D object can lead to both point position changes in LiDAR point clouds and pixel value changes in camera images. Thus, an attacker can leverage shape manipulations to introduce input perturbations to both camera and LiDAR simultaneously. This attack vector is: (1) easily realizable & deployable in physical world via 3D-printing services today, and (2) able to achieve high stealthiness by mimicking normal road objects such as traffic cones/barriers.
MSF-ADV design. To systematically generate such attack, we design a novel attack method, MSF-ADV, which can automatically generate adversarial 3D object meshes in an optimization framework given an MSF-based AD perception algorithm. We overcame various design challenges such as non-differentiable target camera and LiDAR sensing systems, and non-differentiable cell-level aggregated features popularly used in LiDAR-based AD perception. Details are in our paper.
Evaluation & Impact
Attack evaluation. We evaluate on MSF algorithms in 2 open-source industry-grade AD systems, Baidu Apollo and Autoware, with real-world driving data. MSF-ADV is shown to be (1) highly effective with >=91% success rate across different object types and MSF algorithms; (2) stealthy from the driver's view based on a user study; (3) robust to different victim approaching positions and angles, with >95% average success rates; and (4) transferable across different MSF algorithms, with an average transfer attack success rate of ~75%.
Physical-world evaluation. To understand the attack realizability in the physical world, we 3D-print our adversarial objects, and evaluate them using real LiDAR and camera devices. Using a vehicle with a LiDAR mounted, our 3D-printed adversarial object successfully evades LiDAR detection in 99.1% of collected frames. Using a miniature-scale experiment setting, our 3D-printed adversarial object has an 85-90% success rate to evade both LiDAR and camera detection. See experiment videos below.
End-to-end attack impacts. To understand end-to-end safety impact, we use LGSVL, a production-grade AD simulator, and find that our adversarial traffic cone can cause a 100% vehicle collision rate for an AD vehicle running industry-grade AD systems across 100 runs. In contrast, the collision rate with a normal traffic cone is 0%. See video demos below.
Attack Video Demos
Combined Video (Miniature-Scale Setup + Real-Vehicle Setup + End-to-End Simulation Setup)
The following video shows a combination of the video demos for the 3 evaluation setups in our work: (1) miniature-scale physical-world setup, (2) real vehicle-based physical-world setup, and (3) end-to-end simulation setup. Detailed information and individual video demos for each setup are in the following sections.
Miniature-Scale Physical-World Setup
In this setup, we 3D-print the adversarial object and obtain its point clouds and images using physical LiDAR and camera devices like in the actual physical-world attack settings but in a miniature scale.
LiDAR sensor: Velodyne VLP-16 LiDAR
Camera sensor: iPhone 8 plus back camera
LiDAR object detection model: Baidu Apollo v5.5 LiDAR model
Camera object detection model: Baidu Apollo v5.5 camera model
Setup scale: 1: 6.67
3D object type: Traffic cone of 50 cm x 50 cm x 100 cm (real-world scale)
The setup image, 3D-printed benign traffic cone, and 3D-printed adversarial traffic cone are as follows.
3D-Printed Benign Traffic Cone
3D-Printed Adversarial Traffic Cone
Detection Video
Using the miniature-scale setup above, we dynamically move the camera and LiDAR devices to capture both the images and point clouds of the traffic cones from different distances and angles. Specifically, the video below shows live detection results for:
Benign traffic cone case: As shown, the 3D-printed benign traffic cone object can generally be detected by both camera and LiDAR.
Adversarial traffic cone case: We put it in the same position as the benign one and follow the same movement. As shown, in the camera and LiDAR views its pattern looks pretty similar to the benign case. However, in this case it can never be detected by both camera and LiDAR in any of these frames.
Real Vehicle-based Physical-World Setup
LiDAR sensor: Velodyne HDL-64E
LiDAR object detection model: Baidu Apollo v2.5 camera model
Camera object detection model: Baidu Apollo v5.5 camera model
Benign object: a box of 75 cm x 75 cm x 75 cm
Ethics: We ensure that no other vehicles are affected during the experiments.
The figures below shows the settings of our physical-world experiments.
3D-Printed Adv Object (look like a rock)
Vehicle mounted with LiDAR
Benign Object
Physical-World Road
Detection Video
Using the real-vehicle setup above, we manually drive towards the objects to capture their images and point clouds. However, as shown in the videos below, the same as the traffic cone case above, the adversarial object can never be detected by both camera and LiDAR in any of these frames.
End-to-End Attack Simulation Evaluation
AD system: Baidu Apollo r5.0.0
Enabled modules: Localization, Perception, Prediction, Planning, Routing, Control, Transform, Dreamview
Simulator: LGSVL simulator version 2019.11
Map: Single Lane Road
AD vehicle model: Lincoln MKZ 2017
The video demo below includes:
Benign traffic cone case: As shown, at the 8th second the AD vehicle is able to correctly detect the benign traffic cone and thus comes to a full stop before the cone at the 13th second.
Adversarial traffic cone case: As shown, at the 28th second, the AD vehicle fails to detect the adversarial traffic cone and thus directly crashes into it.
FAQ
Is MSF-ADV attack specific to the MSF-based AD perception algorithms in Baidu Apollo and Autoware.AI?
No, we take Apollo and Autoware.AI MSF-based AD perception as concrete evaluation targets in our work because they are the closest publicly-accessible ones to industry-grade AD systems today. In essence, MSF-ADV is a fundamental attack to the Deep Learning-based in-road obstacle detection via combined use (i.e., MSF) or individual uses of camera and LiDAR sensors, since (1) DNN is generally vulnerable to small noises in inputs and (2) state-of-the-art object detection DNN generally is found hard to detect small objects, both of which are what our attack is leveraging. Thus, from the design point of view, MSF-ADV is general to different DNN-based MSF-based AD perception.
Due to the lack of public access, it is unclear whether other AD companies are vulnerable to our attack. However, since our attack is general to DNN-based MSF by design, if other AD vehicle companies also adopt such a representative design, at least at the design level they are also susceptible to MSF-ADV attack.
Why crashing into a traffic cone is a threat to safety?
Benign traffic cones are indeed usually small, light, and soft, and thus unlikely to cause severe crashes. However, for an attacker, the traffic cone shape is just a disguise for stealthiness; she can choose a large traffic cone size (e.g., a 1-meter one) and fill it with granite or even metal to make it harder and heavier, which can at least trip a moving car and cause it to lose control, especially when it is driving at a high speed. Besides causing damages by the crash itself, the attacker can also exploit the semantic meaning of traffic cones. For instance, she can design an AD-specific attack by placing nails or glass debris behind an adversarial traffic cone object so that failing in detect it can lead to tire blowout of a targeted AD vehicle. Here, the safety damages are not directly caused by the traffic cone crash itself, and thus, in this case, the adversarial traffic cone can be smaller and lightweight like normal ones to make it easier to 3D-print, carry, and deploy.
Also we would like to note that our attack is general to any 3D object types, not just traffic cones. Considering the attack stealthiness in AD context, it is better to choose common object types in real-world roadways. Traffic cone is one such example, and the attacker can also choose traffic barriers and even a rock-like object as shown in the real vehicle-based demo above. For physical-world realizabilty (e.g., 3D-printing cost) and deployability, it is better to choose objects with smaller sizes but fill them with denser materials to cause larger safety threats (detailed above). From this perspective, traffic cone is also a nice choice here, e.g., compared with full-size cars.
To achieve the same goal, why can't the attacker just throw stones/rocks, nails, or glass debris in front of a victim AD vehicle?
We are researchers for computer security, so our goal is to study security vulnerabilities specific to computer technology (in our case, autonomous driving), with the ultimate goal to fix them at the computer technology level. Throwing stones/rocks, nails, or glass debris can also cause crashes to human-driving cars; thus, AD technology may not be deemed as at fault here and thus such an attack vector is out of our scope and expertise. By contrast, the attack objects generated by MSF-ADV can be correctly detected for human eyes but not for AD technology, which is thus of great interest to us because by discovering and fixing such problems, the AD technology can be better and closer to human performance and thus more correctly approach the ultimate goal of such technology: replacing human drivers.
Can common fail-safe driving features such as Automatic Emergency Brake (AEB) prevent this attack?
It may mitigate the risks of severe crashes, but can neither fully prevent such crashes, nor eliminate the need to defend against our attack. First, AD software must be design to handle as many safety hazards as possible by itself, instead of fully counting on AEB. AEB is only designed as an emergency-case/backup safety protection measure; it never and thus should not be used to substitute for remaining alert at the wheel. Just like human driving, nobody fully counts on AEB to ensure safety; you have to always stay cautious and proactively make safe decisions first, and then rely on AEB only as best-effort backup protection in corner cases. Thus, not being able to avoid crashes at the AD software level is clearly a mistake that we have to solve at the AD software level.
Second, AEB itself is actually far from perfect today and can have high false negative rates. For example, AAA report that the AEB on popular vehicle models (e.g., Chevrolet Malibu, Honda Accord, Tesla Model 3, and Toyota Camry) has a 60% failure rate. In addition, even if the AEB on the victim can successfully perform an emergency stop, it cannot avoid the victim from being hit by rear vehicles that fail to yield on time.
Why the MSF algorithms you evaluated do not include RADAR inputs? Can RADAR be used to defeat your attack?
Although RADAR can perform object detection, we find it is rarely used in state-of-the-art MSF algorithms. In particular, we found in total 10 MSF designs in AD context from top-tier robotics and computer vision conferences in the most recent 3 years (2017--2019), and all of them choose to fuse from camera and LiDAR without considering RADAR. In addition, popular AD perception benchmarks in both academia and industry (e.g., Waymo Open Dataset, KITTI, ApolloScope, and Argoverse) all only provide LiDAR and camera data without RADAR data. This might be because LiDAR and RADAR are both distance measurement based sensors, but today LiDAR has much higher resolution and accuracy in such measurements than RADAR, which makes RADAR less useful in latest fusion designs. Thus, in this paper we focus on MSF algorithms of the most popular designs today: fusing camera and LiDAR.
Although RADAR is less preferred in state-of-the-art MSF designs, including it into the fusion process may make it more difficult to generate attack objects if the RADAR perception model is more robust. However, this may not fundamentally defeat our attack since RADAR point clouds may also be affected by shape manipulations and their state-of-the-art object detection algorithms are still DNN-based. Also, it is unclear whether including it in MSF will degrade the overall MSF performance under normal (benign) conditions. We leave a systematic exploration of these to future work.
Can MSF-ADV be effective under black-box attack settings (i.e., without access to victim MSF algorithms)?
In this paper we mainly focus on white-box attack settings (i.e., with access to the victim MSF algorithms) because as the first study we need to first understand whether such an attack idea is feasible or not. In black-box attack settings, some parts of the current attack design cannot be used (i.e., the parts requiring gradients of the DNN models). Nevertheless, the remaining parts can still work to generate attack objects:
Case 1: If the attacker can gain ownership of a victim AD vehicle model (e.g., via purchases such as for Tesla AD vehicles)
In this case, the attacker can reverse-engineer the AD software, at least to the level that it can read and write the algorithm input/output. Such capability has already been shown by recent work for the DNN model used in Tesla Model S. In this case, the attacker can replace the current gradient-based optimization methods in MSF-ADV with black-box optimization methods such as genetic algorithms to generate adversarial objects. In our paper, we experimented with such an attack setup as a baseline evaluation setup, which achieved 9% attack success rate given similar execution time (~4h) as the white-box design. Such success rate can be higher if given more execution time.
Case 2: If the attacker cannot gain ownership of a victim AD vehicle model
In this case, the attacker can consider leveraging the transferability of adversarial examples, e.g., using the adversarial examples generated for a known DNN model to attack an unknown one, which has found to be successful for many adversarial attacks today. In our paper, we found a ~75% transfer attack success rate across 4 MSF algorithms from the two open-source full-stack AD systems we can access: Baidu Apollo and Autoware.AI. Although this is already our best effort considering the lack of access to other industry-grade AD systems such as Waymo and GM Cruise, results from algorithms in only 2 AD systems may not be sufficient to conclude anything general. We hope to collect more data points in the future via more engagement with the industry.
How to defend against MSF-ADV?
To fundamentally prevent the MSF-ADV attack, it requires to eliminate DNN model vulnerabilities. However, this is far from a solved problem yet. For short-term immediately-applicable mitigation direction, we suggest AD companies to consider fusing more perception sources, e.g., more cameras/LiDARs sharing an overlapped view but mounted at different positions, assuming that our attack may be more difficult to generate MSF-ADV attack if the fused camera/LiDAR perception results are from very different viewing angles and positions. Also, we may consider including RADAR into MSF, which is less preferred in state-of-the-art MSF designs (detailed above) but may help improve their security. As discussed above, this cannot fundamentally defeat our attack, but may make it more difficult to generate the attack objects if the RADAR perception model is more robust.
Why didn't you consider using texture changes to attack camera while using shape changes to attack LiDAR?
Our attack design can indeed be easily extended from only introducing malicious shape changes to introducing both malicious texture and shape changes of the 3D objects during the attack generation. However, in this paper we choose the former because the latter does not seem to have a clear benefit over the latter. Specifically, adding texture changes is not likely to substantially improve the current attack effectiveness (shape changes alone can already achieve over 90% success rates across 3 object types); meanwhile, it will certainly harm stealthiness and also incur additional printability issues, which is a common design challenge for existing physical-world adversarial attacks using stickers.
Did you perform responsible vulnerability disclosure to AD companies? What are their replies?
As of 05/18/2021, we performed responsible vulnerability disclosure to 31 companies developing/testing AD vehicles, among which 19 (~61%) have replied. Based on the replies, most companies are currently under investigation on whether and how much they might be affected. Some already had meetings with us to facilitate such investigation.
Research Paper
[IEEE S&P 2021] Invisible for both Camera and LiDAR: Security of Multi-Sensor Fusion based Perception in Autonomous Driving Under Physical-World Attacks
Yulong Cao*, Ningfei Wang*, Chaowei Xiao*, Dawei Yang* (co-first authors), Jin Fang, Ruigang Yang, Qi Alfred Chen, Mingyan Liu, and Bo Li
To appear in the 42nd IEEE Symposium on Security and Privacy (IEEE S&P), May 2021 (Acceptance rate 12.0% = 117/972)
[PDF] [Slides][Preview] [Talk] [Video Demos] [Code/Data Release]
BibTex for citation:
@inproceedings{sp:2021:ningfei:msf-adv,
title={{Invisible for both Camera and LiDAR: Security of Multi-Sensor Fusion based Perception in Autonomous Driving Under Physical World Attacks}},
author={Yulong Cao and Ningfei Wang and Chaowei Xiao and Dawei Yang and Jin Fang and Ruigang Yang and Qi Alfred Chen and Mingyan Liu and Bo Li},
booktitle={Proceedings of the 42nd IEEE Symposium on Security and Privacy (IEEE S\&P 2021)},
year={2021},
month = {May}
}
Team
Yulong Cao, Ph.D. student, EECS, University of Michigan
Ningfei Wang, Ph.D. student, CS, University of California, Irvine
Chaowei Xiao, NVIDIA Research and Arizona State University
Dawei Yang, Ph.D. student, EECS, University of Michigan
Jin Fang, Baidu Research and National Engineering Laboratory of Deep Learning Technology and Application, China
Ruigang Yang, Inceptio
Qi Alfred Chen, Assistant Professor, CS, University of California, Irvine
Mingyan Liu, Professor, EECS, University of Michigan
Bo Li, Assistant Professor, CS, University of Illinois at Urbana-Champaign
Acknowledgments
*header image source: https://www.consumerreports.org/autonomous-driving/self-driving-cars-driving-into-the-future/