RQ3: Falsification

Falsification has been widely adopted for the safety assurance of CPS in detecting input cases that could trigger system behaviors that violate safety requirements. However, the existing optimizationbased falsification techniques for traditional CPS are recently found to be ineffective for AI-CPS, when the AI components are included as the key parts of the system. Thus, as another promising direction of performing the safety analysis, in this subsection, we present a novel offline model-guided falsification method designed for AI-CPS.

The table below show the falsification results, comparing with three state-of-the-art methods.

Falsification performance comparison between four existing falsification algorithms

The experimental results of falsification are presented in Table 6.

We run 30 falsification trials for each falsification approach and report the number of successful trials, i.e., FSR, as the indicator for the effectiveness of the approach. We highlight in the table the best approach in each benchmark system, where the first and second priorities are given to FSR and #sim, respectively. Based on the results, we can observe that:

First, Mosaic obviously outperforms the other three falsification techniques. In 13 out of 16 falsification problems, Mosaic has the best performance. There are 9 trials where Mosaic is strictly better than all other approaches, i.e., the FSR is strictly great than other methods.
There are some experiments, e.g., ACC-DDPG with 𝜑ACC 1 and AFC-DNN1 with 𝜑AFC 2 , where Mosaic does not perform as well as Mosaic𝑟𝑎𝑛𝑑 . One possible reason for this is that, compared to Mosaic𝑟𝑎𝑛𝑑 , Mosaic has a better balance between exploration and exploitation. Therefore, it spends more time in searching for suspicious regions, leading to a worse performance than the random exploration with optimization-based exploitation. There is also one case where Breach performs the best (ACC-SAC with 𝜑ACC2 ), but only with one more found falsifying input compared to Mosaic.
Random can not outperform any other methods in the falsification trials. This means Random is the worst falsification approach, as it only conducts random exploration in the input space.

Google Sites

Report abuse