Safety Analysis for AI-enabled Cyber-Physical System
Quality assurance of AI-CPSs is of urgent need since many of its applications are in safety-critical domains. Therefore, there comes an increasing trend of relevant research activities in this direction recently. Among those methods, the two most popular are \emph{verification} and \emph{falsification}.
Verification can provide \emph{definite} guarantee for the system, for the cost of scalability. Bae and Lee adopt syntactic separation, which decomposes an STL formula into an equivalent first-order formula, to provide a bounded model checking technique for CPS on STL specifications. Some researches focus on learning-based control system, based on the layer-wise computation of overapproximate reachable sets or the transformation of AI controller into an equivalent hybrid system. However, they can only handle the case of safety specifications, whereas we can handle the general STL specifications. Our approach can be also considering as a lightweight formal analysis technique.
For traditional CPSs, researchers have proposed various effective approaches of \emph{falsification} that try to show the violation of a specification rather than proving its satisfaction. For example, ForeSee leverages QB-robustness, which is a novel semantics targeting at the scale problem, to guide Monte Carlo tree search, which searches over the structure of the formal specification, towards the violation of the specification. However existing work shows that, classic optimization-based falsification techniques, which are designed for traditional CPSs, are not effective in finding falsifying inputs for AI-CPSs.To address this problem, in this work, we propose MOSAIC as a novel safety analysis framework designed for AI-CPSs. The results confirm that MOSAIC is effective in guiding the search process towards the violation of the specification, which results in an efficient falsification technique.
For CPS with AI components, some recent works have also been proposed to assure its safety and reliability, e.g., Al-Nima et al. propose a Genetic Algorithm of Neuron Coverage (GANC) to improve the performance and robustness of DRL controllers in self-driving applications. Their results show that the neuron convergence of the DRL network is increased by injecting embedded information from the inputs, and the reliability and robustness of the DRL controllers are enhanced against abnormal weather conditions. Zolfagharian et al. leverage ML models and genetic algorithms to test the reliability of DRL agents towards faulty episodes. They notice that the searching space of testing a DRL agent can be substantially narrowed by a ML model that predicts the probability of functional faults within a generated episode. In addition to offline testing for AI systems, our work provides a general safety analysis framework that also extensively provides online safety operation advice by safety monitoring. Our framework not only provides AI-aware guidance to a two-stage falsification but also empowers the performance of AI controllers during the execution phase.
Abstraction-based Analysis of AI systems
Building an abstract model for DNNs recently attracted researchers' interest to facilitate the explainability of the AI model. Several approaches have been introduced to address this problem, and most of them utilize deterministic finite automata (DFA) or the Markov model. For example, Weiss et al. design an active learning algorithm to extract a DFA, in which the refinement process is performed by using the counterexamples returned from equivalence query, based on Angluin’s algorithm. For example, Wang et al. propose a method that applies learning and abstraction on system log traces to automatically enable formal verification of discrete-time systems. Later, Du et al. proposed MARBLE, a quantitative adversarial robustness analysis technique for recurrent neural networks (RNNs). It extracts an abstract model from RNN to quantitatively measure the robustness information of RNN, which shows its effectiveness in the detection of adversarial examples and builds state-level and transition-level coverage criteria on top of the DTMC, which helps the application of adversarial sample detection and coverage-guided testing. Khmelnitsky et al. propose to combine statistical model checking and active automata learning to perform robustness certification for a recurrent neural network. In contrast, our work focuses on constructing a general-purpose safety analysis framework, and performing the safety guidance and detection for AI-CPS, which is more on the system level, where DNNs behave as the key components in the system.
Testing of AI system
Pei et al. propose to utilize \emph{neuron coverage}, which computes the percentage of activated neurons exercised by the test inputs, to systematically test DNN. Ma et al consider the output of the neurons, and introduce a set of multi-granularity testing criteria, including neuron level and layer level criteria. TensorFuzz and DeepHunter are coverage-guided fuzz testing frameworks, which apply to a set of well-known DNN models for bug hunting. However, existing work mostly focuses on image classfication domain, which does not consider the temporal behaviors of the AI system. In the domain of AI-enabled CPS, the decisions of AI controller over time greatly influence the safety of the system, which is captured by our constructed SA-MDP.