BehAVExplor:Behavior Diversity Guided Testing for Autonomous Driving Systems

Authors: Mingfei Cheng, Yuan Zhou and Xiaofei Xie

Workflow of BehAVExplor

Figure 1. Overview of BehAVExplor workflow

Figure 1 is the overview of BehAVExplor workflow. Like a general fuzzer, BehAVExplor maintains a seed input corpus that stores "interesting" seed inputs that are helpful in identifying failed or diverse test cases. At each iteration, BehAVExplor adopts an Energy mechanism to select a seed with higher energy from the input corpus. The energy of a seed quantifies how well it can generate failed test cases. Then an adaptive mutation strategy is proposed to generate new test cases by mutating the selected seed.

To search for diverse and critical test cases, BehAVExplor defines the behavior diversity and scenario criticality as the fuzzing feedback to select "interesting" test cases that can increase the diversity or violation degree. Specifically, the new mutants will be fed into the target ADS that generates observation traces. BehAVExplor characterizes the behavior of each mutant through BehaviorMiner. The diversity is measured based on the difference between the behavior of the new input and the behaviors of existing seeds in the seed corpus. Moreover, the general violation functions (e.g., collision, hitting illegal lines) regarding AV failures are defined to evaluate the criticality of the mutant. The mutants with new behaviors or better violation degrees will be added to the input corpus.

Statistical Measures in BehaviorMiner

In BehaviorMiner, we obtain a merged state from a state sequence by using eight different statistical measures, which is represented as eq (6) in our paper. Given a time sequence x, details of these eight statistical measures are explained as follows:

Mean: the mean value of x.
Minumum: the minumum value of x.
Maximum: the maximum value of x.
Mean Change: the mean over the differences between subsequent time series values. Detail
Mean Abs-Change: the mean over the absolute differences between subsequent time series values. Detail
Variance: the variance of x.
Non-linearity: Uses c3 statistics to measure non linearity in the time series. Detail
Time Series Complexity: Use CID to estimate the complexity of the time series x (A more complex time series has more peaks, valleys etc.). Detail

Four Functional Scenarios

We run BehavAVExplor on the following four functional scenarios related to intersections and roads in the real world:

S1: Ego goes straight through a non-signalized intersection.
S2: Ego turns left at a non-signalized intersection.
S3: Ego follows a lane at a straight road with four lanes.
S4: Ego changes lanes at a straight road with four lanes.

The illustration of these four functional scenarios are shown in below figures. During test in this scenarios, the start and destination of Ego (Red route line) are not changed, and the trace of each NPC vehicle is not fixed.

S1 Go straight through an intersection

S3 Follow a lane at a straight road

S2 Turn left at an intersection

S4 Change lanes at a straight road

Violations Discovered by BehAVExplor

This picture shows abstract violation patterns discovered by BehAVExplor. The red vehicle is the ego, and the black vehicles are NPC. The explanation of each abstract violation can be found at our paper.