This website provides supplementary materials for the paper "MultiTest: Physical-Aware Object Insertion for Testing Multi-sensor Fusion Perception Systems".
Multi-sensor fusion (MSF) stands as a pivotal technique in addressing numerous safety-critical tasks and applications, e.g., self-driving cars and automated robotic arms. With the continuous advancement in data-driven Artificial Intelligence (AI), MSF’s potential for sensing and understanding intricate external environments has been further amplified, bringing a profound impact on intelligent systems and specifically on their perception systems. Similar to traditional software, adequate testing is also required for AI-enabled MSF systems. Yet, existing testing methods primarily concentrate on single-sensor perception systems (e.g., image-/point cloud-based object detection systems). There remains a lack of emphasis on generating multi-modal test cases for MSF systems.
To address these limitations, we design and implement MultiTest, a fitness-guided metamorphic testing method for complex MSF perception systems. MultiTest employs a physical-aware approach to synthesize realistic multi-modal object instances and insert them into critical positions of background images and point clouds. A fitness metric is designed to guide and boost the test generation process. We conduct extensive experiments with five SOTA perception systems to evaluate MultiTest from the perspectives of: (1) generated test cases’ realism, (2) fault detection capabilities, and (3) performance improvement. The results show that MultiTest can generate realistic and modality-consistent test data and effectively detect hundreds of diverse faults of an MSF system under test.Moreover, retraining an MSF system on the test cases generated by MultiTest can improve the system’s robustness.
The website is organized as follows:
Home page: The motivation for why MultiTest k is urgently needed, which is followed by an illustration and an introduction of our research workflow.
Approach: This section contains details and visualizations illustrating the main functional modules (e.g., Pose Estimation Module and Sensor Simulation Module) in MultiTest's workflow.
Data visualisation: This section contains some examples of test data generated by MultiTest, which are presented from three perspectives: image, point cloud, and modality-consistency.
Research Questions. This section contains all more details and visualizations in our experiment as supplements to our paper.
Replication Package: This section provides the essential procedures required to reproduce the experimental results in our paper.
Lou et al. [1] conducted semi-structured interviews with developers from 10 autonomous driving companies and surveyed 100 developers who have worked on autonomous driving systems. They analyzed the gap between ADS research and practitioners’ needs and proposed several future directions for SE researchers.
First, the survey indicates that utilizing multimodal fusion strategies to improve the performance of autonomous driving systems is more popular.
70% of interviewees and over 68% of survey partici pants said their driving systems used at least three of four types of sensors, e.g., cameras, LiDARs, radars, GPS.
Second, in the "Unit Testing" section, the survey indicates that testing perception systems is a common task during ADS development. In addition, construct test data manually is time-consuming.
In addition to writing unit tests for control logic, ADS developers also need to test DL models, especially those models in the perception and prediction modules. ..., It is time-consuming to manually process driving recordings to construct test scenarios.
Besides, the survey shows that it is a common practice to test perception systems with multi-modal sensor data as input.
Common Practice 4: In addition to testing control logic, ADS developers also need to construct segments of driving recordings to test DL models, which take multi-modal sensor data as input, not just road images.
Third, the survey indicates that existing techniques mainly focus on single-modal senror data.
These multi-module ADSs take multiple types of sensor data as input. Yet the majority of test generation techniques only generate road image data. Therefore, it is worth investigating how to generate multi-modal sensor data for new driving scenarios.
In additon, due to the consistency constraints of the different modals, generating multimodal data is worthwhile but challenging.
This multi-modality of sensor data makes it more difficult to generate test cases. ..., When transforming one type of sensor data, such as adding an object, other types of sensor data must be updated consistently.
Therefore, we design and implement MultiTest for generating realistic and modality-consistent test data to satisfy test input specifications for both perception systems.
The workflow of MultiTest
MultiTest employs a physical-aware approach to render modality-consistent object instances using virtual sensors to for Testing Multi-sensor Fusion (MSF) Perception Systems.
Figure above presents the high-level workflow of MultiTest.
First, MultiTest employs real-world multi-modal data (i.e., images and point clouds) as the target scene and leverage well-designed 3D models to build object database.
Second, Given a background multi-modal data recorded from real-world and an object instance selected from the object database, MultiTest first executes the pose estimation module to calculate the valid locations and orientations of an object to be inserted. Then the multi-sensor simulation module renders the object instance in the form of both image and point cloud given the calculated poses in a physical-aware virtual simulator. The multi-sensor simulation module further merges the synthesized image and point cloud of the inserted object with the background data and carefully handles the occlusion. These two modules form the MultiTest’s multi-modal test data generation pipeline.
Third, the realistic multi-modal test data can be efficiently generated through fitness guided metamorphic testing. The fitness metric is designed to measure the likelihood of a test data to reveal errors.
[1] Guannan Lou, Yao Deng, Xi Zheng, Mengshi Zhang, and Tianyi Zhang. 2022. Testing of autonomous driving systems: where are we and where should we go?. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on theFoundations of Software Engineering.