Research Question

Research Questions

Here, we provide more details and raw experimental data in our experiment as supplements to our paper.

We first investigate the robustness of existing AI-enabled MSF systems from three perspectives: (1) against corrupted signals (RQ1), (2) against spatial/temporal misalignments (RQ2), and (3) against partial/complete signal loss (RQ3). Then, we investigate the potential of repairing these MSF systems’ robustness (RQ4).

The Architecle of MSF

As shown in Fig (right part), MSF-based Perception Systems first leverage multiple complementary sensors to sense the surrounding environ ment individually. Then, signals from different sensors are transformed into the same coordinate system and matched across the timestamps based on the temporal and spatial calibration among sensors. Finally, the fusion module receives calibrated and synchronized signals from different sensors, and fuses them to make predictions for downstream tasks

Research Questions Design

Based on the architecture and workflow of MSF, we design the following research questions and identify several challenges and opportunities:

RQ1 How well do AI-enabled MSF-based perception systems perform against common corrupted signals? This RQ aims to investigate the potential risks of AI-enabled MSF systems against common corrupted signals that typically occur in operational environments (① in Fig).

RQ2 How sensitive is AI-enabled MSF when facing spatial and temporal misalignment of sensors? In practice, it is almost impossible to maintain perfect calibration or precise time synchronization of the system across sensors all the time. RQ2 aims to evaluate the sensitivity of AI-enabled MSF to spatial and temporal misalignment (② in Fig).

RQ3 To what extent are existing sensing components coupled of an AI-enabled MSF system? A robust MSF should not completely fail when one or a part of the whole sensing modules lose the source signal. RQ3 aims to investigate how AI-enabled MSF systems are impacted when a partial or complete of the source signals is lost. (③ in Fig)

RQ4 What is the weakness of different AI-enabled MSF mechanisms and is it possible to enhance their robustness? RQ4 aims to investigate the unique advantages of each fusion mechanism and potential opportunities for improving the robustness of AI-enabled MSF systems. (④ in Fig)

RQ1: How well do AI-enabled MSF-based perception systems perform against common corrupted signals?

Though a few AI-enabled MSF perception systems have been proposed and used, there is no systematic study on the robustness (i.e., an important indicator of quality and reliability) of these systems. Specifically, we intend to study how AI-enabled MSF systems perform against corrupted data, which could often occur in the operational environment. To investigate this RQ, we first design and implement a set of corruptions to synthesize large-scale datasets. Then, we evaluate the performance of collected MSF systems with these datasets.

In this experiment, we focus on corrupted signals due to weather, sensor, and noise corruptions. For each corruption pattern, we leverage three severity levels. For RN (rain) and FG (fog), three severity levels represent 10mm/h, 25mm/h, and 50mm/h of rainfall and 104m, 80m, and 51m of visibility, respectively. For other corruption patterns, we adopt level 1/3/5 as three different levels from ImageNet-C. To sum up, we conduct experiments with 231 different configurations (11 corruptions* 3 severity levels*7 MSF systems) to investigate this RQ.

Experimental Data

The following table is the raw experimental data on different configurations of MSF systems in RQ1.

We use AP (%), MOTA (%), RMSE (mm) metrics to evaluate the performance of MSF systems in Object Detection, Object Tracking, Depth Completion tasks respectively. For AP and MOTA, a higher value indicates better performance; for RMSE, a lower value indicates better performance. "W", "M", "S" in column Severity means three severity levels i.e. weak, moderate, and strong. "C", and "L" means camera and LiDAR.

Raw evaluation metric score of seven MSF systems against different corruption patterns .

Then, we use the robustness evaluation metrics (section IV in the paper) to calculate the robustness score of each MSF system. Note that we normalize the metric of each task into [0, 1]. Specifically, for object tracking task, we treat a negative MOTA as 0. For the depth completion tasks, we further qualitatively checked the generated depth images. If the RMSE is greater than 5 times the performance of TWISE in a clean dataset, we consider that the generated depth map is meaningless.

Robustness performance of seven MSF systems against different corruption patterns.

Findings. Existing AI-enabled MSF systems are not robust enough against common corruption patterns. Moreover, among the 11 common corruptions, adverse weather causes the most severe robustness degradation.

RQ2: How sensitive is AI-enabled MSF when facing spatial and temporal misalignment of sensors?

RQ2 aims to evaluate the AI-enabled MSF system's sensitivity to spatial and temporal misalignment, which often happens in practice. To investigate this RQ, we add a small perturbation to the MSF system's extrinsic calibration matrix and a small delay to input signals.

Specifically, to simulate the calibration errors (spatial misalignment), we rotate the LiDAR sensor around the x, y, and z axes of the coordinate system by 0.5°, 1°, and 2°, respectively. To simulate temporal misalignment, we create five levels of LiDAR and camera signal delay, i.e., 0.1s, 0.2s, 0.3s, 0.4s, 0.5s, respectively.

Note that the refresh rate of LiDAR used in KITTI is 10 HZ, therefore each 0.1 second of delay corresponds to 1 data frame late. We only investigate temporal misalignment's effects on object tracking MSF systems as the other two tasks (i.e., object detection and depth completion) are not time-sensitive.

Experimental Data

Like RQ1, We use AP (%), MOTA (%), RMSE (mm) metrics to evaluate the performance of MSF systems in Object Detection, Object Tracking, Depth Completion tasks respectively. Then we use the robustness evaluation metrics (section IV in the paper) to calculate the robustness score of each MSF system.

Raw evaluation metric score of seven MSF systems against spatial misalignment.

Findings. AI-enabled MSF perception systems are sensitive to both temporal and spatial misalignment, especially for LiDAR. Only small synchronization (0.3 seconds) and calibration errors (2°) can lead to a crash of AI-enabled MSF systems.

RQ3: To what extent are existing sensing components coupled of an AI-enabled MSF system?

Though the MSF system has shown superior performance on several perception tasks, it might also bring the problem of coupling among modules. A tightly-coupled MSF system might suffer from a system failure when one specific branch of the sensor breaks, positing reliability concerns. This RQ aims to investigate how existing AI-enabled MSF systems are coupled and if they are robust enough to senor failures. To investigate this RQ, we simulate the signal missing with five different levels (10%, 25%, 50%, 75%, 100%) for each branch. Such settings cover from partial signal missing to complete signal loss. For the camera branch, we reshape the image into a one-dimensional array and drop some pixels according to the percentage of signal loss. For the LiDAR branch, we randomly remove some points with different percentages.

Experimental Data

Like RQ1, We use AP (%), MOTA (%), RMSE (mm) metrics to evaluate the performance of MSF systems in Object Detection, Object Tracking, Depth Completion tasks respectively. Then we use the robustness evaluation metrics (section IV in paper) to calculate the robustness score of each MSF systems.

Raw evaluation metric score of seven MSF systems when partially or completely losing one source of signals.

Findings. AI-enabled MSF systems could be vulnerable when partially or completely losing one source of signals. In particular, partially losing camera signals could be more critical for AI-enabled MSF systems. We also find that though tightly-coupled AI-enabled MSF systems have promising performance, they could be less robust when completely losing either camera or LiDAR signals.

RQ4: What is the weakness of different AI-enabled MSF mechanisms and is it possible to enhance their robustness?

In the previous research questions, we analyze the potential robustness threats of AI-enabled MSF systems from three perspectives (corruption, misalignment, and coupling issues). RQ4 aims to investigate the properties of different fusion mechanisms, and analyze the weakness or potential threats of each when deploying to the real world.

To investigate this RQ, we first divide the selected MSF systems into three categories according to their fusion mechanisms as discussed in section II (i.e. deep fusion, late fusion, weak fusion).

Then we analyze and discuss each fusion mechanism based on our previous findings. To further investigate the possibility of enhancing the MSF system's robustness, we make an early attempt to enhance MSF systems' robustness by improving the fusion mechanism used in late and weak fusion-based MSF.

Improved AI-enabled MSF mechanisms

While there is no systematic evidence indicating that one specific fusion mechanism is most robust and reliable, we particularly find that different fusion mechanisms may have unique advantages and potential threats due to their inherent properties. According to our findings from RQ1, three deep fusion MSF systems (i.e., EPNet, JMODT, TWISE) are more robust against blur images (MB, DB) and noise patterns (IN(C), IN(L)) than others. According to our finding from RQ3, these systems also perform robust when partially losing camera signals.

Though deep fusion MSF systems might have better performance, the feature interaction operations between branches inevitably lead to a relatively tightly-coupled architecture. Such architecture makes it difficult to fix or repair specific robustness issues. By contrast, we find that late and weak fusion-based MSF systems are usually easier to be enhanced.

Improved AI-enabled MSF mechanisms.

Improved Late Fusion. To improve the late fusion, we leverage a shortcut between the LiDAR branch and fusion layer to enhance the MSF robustness. Specifically, we design a matching method to aggregate high confidence and unique results from an individual branch to the fusion results. This is motivated by our findings in RQ1 and RQ3, where the camera is more susceptible to external environmental interference. Furtherly, the inaccurate 2D frame may interfere with the correct 3D frame, making the high confidence 3D frame disappear or be inaccurate after fusion, resulting in information loss.

The details are as follows:

First, we get the 3D bounding box of the LIDAR branch. If the branch only provides 3D regional proposals, then add a detection header and ensure that the LiDAR branch can run independently.
Next, we select the high confidence (> 0.8) 3D bounding box from LiDAR branch and use these bounding boxes for object matching in the next step. Simply merging all detection results leads to a large number of redundant and low-confidence boxes in the detection results, which reduces the recall rate and imposes a huge computational cost for downstream modules.
Then, we match the fused detection results and the high confidence LiDAR branch detection results. We consider an IOU greater than 70% as the same object.
Finally, we aggregate high confidence (conf > 0.8) and unique results (iou <=0.7) from LIDAR branch to the fusion results. The aggregated results are robust to corrupt signals and partial and complete signal loss in the camera branch without affecting the robustness of the LiDAR branch.

Improved Weak Fusion. Weak fusion uses a cascade architecture to connect two modules in series. Its robustness performance bottleneck is due to inaccurate or missing guidance signals. Therefore, for weak fusion, we leverage a neuron network to extract extra guidance from another modality and connect it to the downstream module as an additional guidance branch.

The details are as follows:

First, we train a 2D detector by projecting the point cloud to 2D front view images. In our experiment, we use Cascade-RCNN (same as rgb detector) to extract extra 2D bounding boxes as guidance from 2D front view images.
Next, we match the guidance between rgb images and 2D front view images. The guidance from the 2D front view has a lower confidence score and higher box coordinate bias due to the lack of detailed information such as color and texture in the front view. Therefore, we use a stricter IOU threshold (iou <= 0.5) and a lower confidence score (conf > 0.5) to ensure that more precise and complementary guidance is provided to downstream modules.
Finally, we extract frustums from point cloud in aggregated 2D bounding boxes instead of only in rgb image and feed frustums to downstream modules.

Here is an example. The camera misses a 2D bounding box of a car (bottom left of the rgb image) as guidance against darkness(DK) corruption, while the front view can be extracted guidance correctly.

An example of using both 2D front view and rgb image as guidance to extract 3D frustums.

Experimental Data

To evaluate the effectiveness of improved fusion mechanisms, we choose CLOCs and FConv as late and weak fusion systems and conduct the same experiments in RQ1 and RQ3. These results shows the performance against corruption patterns of original MSF and enhanced MSF. We find that the enhanced MSF systems are significantly more robust against common corruption patterns. Furthermore, we find that enhanced CLOCs (CLOCs-Rb) and FConv (FConv-Rb) show promising robustness performance against partial and even complete image signal loss.

Raw evaluation metric score of CLOCs-RL and FConv-RL against different corruption patterns.

Raw evaluation metric score of CLOCs-RL and FConv-RL when partially or completely losing one source of signals.

Performance of the original and enhanced MSF

Findings. MSF systems with the same type of fusion mechanisms may have similar robustness issues due to their inherent properties. Deep fusion performs better against some of the corruption patterns. However, weak fusion and late fusion are easier to be repaired when facing specific robustness issues.