Invisible for both Camera and LiDAR:  Security of Multi-Sensor Fusion based Perception in Autonomous Driving Under Physical-World Attacks
(IEEE S&P'21)

[New] Source code of MSF-ADV is released at: https://github.com/ASGuard-UCI/MSF-ADV!

Summary

Today, various companies are developing self-driving cars, e.g., Level-4 Autonomous Driving (AD) vehicles. Some of them (e.g., Google Waymo, TuSimple, Baidu Apollo) are already providing services such as self-driving taxi and trucks on public roads. To ensure correct and safe driving, a fundamental pillar in the Autonomous Driving (AD) system is perception, which leverages sensors such as cameras and LiDARs (Light Detection and Ranging) to detect surrounding obstacles in the real time.

Various prior works have studied the security of AD perception, but so far all of them are limited to attacks on a single perception source, i.e., camera or LiDAR alone. However, production high-level AD systems such as Waymo typically adopt a Multi-Sensor Fusion (MSF) based design to achieve overall higher accuracy and robustness. In such a design, under the assumption that not all perception sources are (or can be) attacked simultaneously, there always exists a possible MSF algorithm that can rely on the unattacked source(s) to detect/prevent such an attack. This basic security design assumption is believed to hold in general, and thus MSF is widely recognized as a general defense strategy for AD perception.

In this work, we perform the first security analysis on MSF-based perception in AD systems today. We directly challenge the above basic security design assumption by demonstrating the possibility of effectively & simultaneously attacking all perception sources used in state-of-the-art MSF-based AD perception, i.e., camera and LiDAR. This for the first time allows to gain a concrete understanding of how much security guarantee the use of MSF can fundamentally provide as a general defense strategy for AD perception.

Novel Attack Design: MSF-ADV

Attack goal: fundamentally defeat MSF design assumption. We target an attack goal with a direct safety impact: fool a victim AD vehicle to fail in detecting a front obstacle and thus crash into it.  We aim at effectively attacking all perception sources used in state-of-the-art MSF, i.e., both camera and LiDAR, to fundamentally defeat the MSF design assumption above.

Attack vector: Adversarial 3D objects. We discover that adversarial 3D objects can be used as a physically-realizable and stealthy attack vector for MSF-based AD perception. Our key observation is that different shapes of a 3D object can lead to both point position changes in LiDAR point clouds and pixel value changes in camera images. Thus, an attacker can leverage shape manipulations to introduce input perturbations to both camera and LiDAR simultaneously. This attack vector is: (1) easily realizable & deployable in physical world via 3D-printing services today, and (2) able to achieve high stealthiness by mimicking normal road objects such as traffic cones/barriers.

MSF-ADV design. To systematically generate such attack, we design a novel attack method, MSF-ADV, which can automatically generate adversarial 3D object meshes in an optimization framework given an MSF-based AD perception algorithm. We overcame various design challenges such as non-differentiable target camera and LiDAR sensing systems, and non-differentiable cell-level aggregated features popularly used in LiDAR-based AD perception. Details are in our paper.

Evaluation & Impact

Attack evaluation. We evaluate on MSF algorithms in 2 open-source industry-grade AD systems, Baidu Apollo and Autoware, with real-world driving data. MSF-ADV is shown to be (1) highly effective with >=91% success rate across different object types and MSF algorithms; (2) stealthy from the driver's view based on a user study; (3) robust to different victim approaching positions and angles, with >95% average success rates; and (4) transferable across different MSF algorithms, with an average transfer attack success rate of ~75%.

Physical-world evaluation. To understand the attack realizability in the physical world, we 3D-print our adversarial objects, and evaluate them using real LiDAR and camera devices. Using a vehicle with a LiDAR mounted, our 3D-printed adversarial object successfully evades LiDAR detection in 99.1% of collected frames. Using a miniature-scale experiment setting, our 3D-printed adversarial object has an 85-90% success rate to evade both LiDAR and camera detection. See experiment videos below.

End-to-end attack impacts. To understand end-to-end safety impact, we use LGSVL, a production-grade AD simulator, and find that our adversarial traffic cone can cause a 100% vehicle collision rate for an AD vehicle running industry-grade AD systems across 100 runs. In contrast, the collision rate with a normal traffic cone is 0%. See video demos below.

Attack Video Demos

Combined Video (Miniature-Scale Setup + Real-Vehicle Setup + End-to-End Simulation Setup)

The following video shows a combination of the video demos for the 3 evaluation setups in our work: (1) miniature-scale physical-world setup, (2) real vehicle-based physical-world setup, and (3) end-to-end simulation setup. Detailed information and individual video demos for each setup are in the following sections.

Miniature-Scale Physical-World Setup

In this setup, we 3D-print the adversarial object and obtain its point clouds and images using physical LiDAR and camera devices like in the actual physical-world attack settings but in a miniature scale. 

The setup image, 3D-printed benign traffic cone,  and 3D-printed adversarial traffic cone are as follows.

3D-Printed Benign Traffic Cone

3D-Printed Adversarial Traffic Cone

Detection Video

Using the miniature-scale setup above, we dynamically move the camera and LiDAR devices to capture both the images and point clouds of the traffic cones from different distances and angles. Specifically, the video below shows live detection results for:

Real Vehicle-based Physical-World Setup

The figures below shows the settings of our physical-world experiments. 

3D-Printed Adv Object (look like a rock)

Vehicle mounted with LiDAR

Benign Object

Physical-World Road

Detection Video

Using the real-vehicle setup above, we manually drive towards the objects to capture their images and point clouds. However, as shown in the videos below, the same as the traffic cone case above,  the adversarial object can never be detected by both camera and LiDAR in any of these frames.

End-to-End Attack Simulation Evaluation

The video demo below includes:

FAQ

Is MSF-ADV attack specific to the MSF-based AD perception algorithms in Baidu Apollo and Autoware.AI?

No, we take Apollo and Autoware.AI MSF-based AD perception as concrete evaluation targets in our work because they are the closest publicly-accessible ones to industry-grade AD systems today. In essence, MSF-ADV is a fundamental attack to the Deep Learning-based in-road obstacle detection via combined use (i.e., MSF) or individual uses of camera and LiDAR sensors, since (1) DNN is generally vulnerable to small noises in inputs and (2) state-of-the-art object detection DNN generally is found hard to detect small objects, both of which are what our attack is leveraging. Thus, from the design point of view, MSF-ADV is general to different DNN-based MSF-based AD perception.

Due to the lack of public access, it is unclear whether other AD companies are vulnerable to our attack. However, since our attack is general to DNN-based MSF by design, if other AD vehicle companies also adopt such a representative design, at least at the design level they are also susceptible to MSF-ADV attack.

Why crashing into a traffic cone is a threat to safety?

Benign traffic cones are indeed usually small, light, and soft, and thus unlikely to cause severe crashes. However, for an attacker, the traffic cone shape is just a disguise for stealthiness; she can choose a large traffic cone size (e.g., a 1-meter one) and fill it with granite or even metal to make it harder and heavier, which can at least trip a moving car and cause it to lose control, especially when it is driving at a high speed. Besides causing damages by the crash itself, the attacker can also exploit the semantic meaning of traffic cones. For instance, she can design an AD-specific attack by placing nails or glass debris behind an adversarial traffic cone object so that failing in detect it can lead to tire blowout of a targeted AD vehicle. Here, the safety damages are not directly caused by the traffic cone crash itself, and thus, in this case, the adversarial traffic cone can be smaller and lightweight like normal ones to make it easier to 3D-print, carry, and deploy.

Also we would like to note that our attack is general to any 3D object types, not just traffic cones. Considering the attack stealthiness in AD context, it is better to choose common object types in real-world roadways. Traffic cone is one such example, and the attacker can also choose traffic barriers and even a rock-like object as shown in the real vehicle-based demo above.  For physical-world realizabilty (e.g., 3D-printing cost) and deployability, it is better to choose objects with smaller sizes but fill them with denser materials to cause larger safety threats (detailed above). From this perspective, traffic cone is also a nice choice here, e.g., compared with full-size cars.

To achieve the same goal, why can't the attacker just throw stones/rocks, nails, or glass debris in front of a victim AD vehicle?

We are researchers for computer security, so our goal is to study security vulnerabilities specific to computer technology (in our case, autonomous driving), with the ultimate goal to fix them at the computer technology level. Throwing stones/rocks, nails, or glass debris can also cause crashes to human-driving cars; thus, AD technology may not be deemed as at fault here and thus such an attack vector is out of our scope and expertise. By contrast, the attack objects generated by MSF-ADV can be correctly detected for human eyes but not for AD technology, which is thus of great interest to us because by discovering and fixing such problems, the AD technology can be better and closer to human performance and thus more correctly approach the ultimate goal of such technology: replacing human drivers.

Can common fail-safe driving features such as Automatic Emergency Brake (AEB) prevent this attack?

It may mitigate the risks of severe crashes, but can neither fully prevent such crashes, nor eliminate the need to defend against our attack. First, AD software must be design to handle as many safety hazards as possible by itself, instead of fully counting on AEB. AEB is only designed as an emergency-case/backup safety protection measure; it never and thus should not be used to substitute for remaining alert at the wheel. Just like human driving, nobody fully counts on AEB to ensure safety; you have to always stay cautious and proactively make safe decisions first, and then rely on AEB only as best-effort backup protection in corner cases. Thus, not being able to avoid crashes at the AD software level is clearly a mistake that we have to solve at the AD software level.

Second, AEB itself is actually far from perfect today and can have high false negative rates. For example, AAA report that the AEB on popular vehicle models (e.g., Chevrolet Malibu, Honda Accord, Tesla Model 3, and Toyota Camry) has a 60% failure rate. In addition, even if the AEB on the victim can successfully perform an emergency stop, it cannot avoid the victim from being hit by rear vehicles that fail to yield on time.

Why the MSF algorithms you evaluated do not include RADAR inputs? Can RADAR be used to defeat your attack?

Although RADAR can perform object detection, we find it is rarely used in state-of-the-art MSF algorithms. In particular, we found in total 10 MSF designs in AD context from top-tier robotics and computer vision conferences in the most recent 3 years (2017--2019), and all of them choose to fuse from camera and LiDAR without considering RADAR. In addition, popular AD perception benchmarks in both academia and industry (e.g., Waymo Open Dataset, KITTI, ApolloScope, and  Argoverse) all only provide LiDAR and camera data without RADAR data. This might be because LiDAR and RADAR are both distance measurement based sensors, but today LiDAR has much higher resolution and accuracy in such measurements than RADAR, which makes RADAR less useful in latest fusion designs. Thus, in this paper we focus on MSF algorithms of the most popular designs today: fusing camera and LiDAR.

Although RADAR is less preferred in state-of-the-art MSF designs, including it into the fusion process may make it more difficult to generate attack objects if the RADAR perception model is more robust. However, this may not fundamentally defeat our attack since RADAR point clouds may also be affected by shape manipulations and their state-of-the-art object detection algorithms are still DNN-based. Also, it is unclear whether including it in MSF will degrade the overall MSF performance under normal (benign) conditions. We leave a systematic exploration of these to future work.

Can MSF-ADV be effective under black-box attack settings (i.e., without access to victim MSF algorithms)?

In this paper we mainly focus on white-box attack settings (i.e., with access to the victim MSF algorithms) because as the first study we need to first understand whether such an attack idea is feasible or not. In black-box attack settings, some parts of the current attack design cannot be used (i.e., the parts requiring gradients of the DNN models). Nevertheless, the remaining parts can still work to generate attack objects:

How to defend against MSF-ADV?

To fundamentally prevent the MSF-ADV attack, it requires to eliminate DNN model vulnerabilities. However, this is far from a solved problem yet. For short-term immediately-applicable mitigation direction, we suggest AD companies to consider fusing more perception sources, e.g., more cameras/LiDARs sharing an overlapped view but mounted at different positions, assuming that our attack may be more difficult to generate MSF-ADV attack if the fused camera/LiDAR perception results are from very different viewing angles and positions. Also, we may consider including RADAR into MSF, which is less preferred in state-of-the-art MSF designs (detailed above) but may help improve their security. As discussed above, this cannot fundamentally defeat our attack, but may make it more difficult to generate the attack objects if the RADAR perception model is more robust.

Why didn't you consider using texture changes to attack camera while using shape changes to attack LiDAR?

Our attack design can indeed be easily extended from only introducing malicious shape changes to introducing both malicious texture and shape changes of the 3D objects during the attack generation. However, in this paper we choose the former because the latter does not seem to have a clear benefit over the latter. Specifically, adding texture changes is not likely to substantially improve the current attack effectiveness (shape changes alone can already achieve over 90% success rates across 3 object types); meanwhile, it will certainly harm stealthiness and also incur additional printability issues, which is a common design challenge for existing physical-world adversarial attacks using stickers.

Did you perform responsible vulnerability disclosure to AD companies? What are their replies?

As of 05/18/2021, we performed responsible vulnerability disclosure to 31 companies developing/testing AD vehicles, among which 19 (~61%) have replied. Based on the replies, most companies are currently under investigation on whether and how much they might be affected. Some already had meetings with us to facilitate such investigation.

Research Paper

[IEEE S&P 2021] Invisible for both Camera and LiDAR: Security of Multi-Sensor Fusion based Perception in Autonomous Driving Under Physical-World Attacks

Yulong Cao*, Ningfei Wang*, Chaowei Xiao*, Dawei Yang* (co-first authors), Jin Fang, Ruigang Yang, Qi Alfred Chen, Mingyan Liu, and Bo Li

To appear in the 42nd IEEE Symposium on Security and Privacy (IEEE S&P), May 2021 (Acceptance rate 12.0% = 117/972)

[PDF] [Slides][Preview] [Talk] [Video Demos] [Code/Data Release]

BibTex for citation:

@inproceedings{sp:2021:ningfei:msf-adv,

  title={{Invisible for both Camera and LiDAR: Security of Multi-Sensor Fusion based Perception in Autonomous Driving Under Physical World Attacks}},

  author={Yulong Cao and Ningfei Wang and Chaowei Xiao and Dawei Yang and Jin Fang and Ruigang Yang and Qi Alfred Chen and Mingyan Liu and Bo Li},

  booktitle={Proceedings of the 42nd IEEE Symposium on Security and Privacy (IEEE S\&P 2021)},

  year={2021},

  month = {May}

}


Team

Yulong Cao, Ph.D. student, EECS, University of Michigan

Ningfei Wang, Ph.D. student, CS, University of California, Irvine

Chaowei Xiao, NVIDIA Research and Arizona State University

Dawei Yang, Ph.D. student, EECS, University of Michigan

Jin Fang, Baidu Research and National Engineering Laboratory of Deep Learning Technology and Application, China

Ruigang Yang, Inceptio

Qi Alfred Chen, Assistant Professor, CS, University of California, Irvine

Mingyan Liu, Professor, EECS, University of Michigan

Bo Li, Assistant Professor, CS, University of Illinois at Urbana-Champaign


Acknowledgments

*header image source: https://www.consumerreports.org/autonomous-driving/self-driving-cars-driving-into-the-future/