Many state-of-the-art Deep learning (DL) systems are vulnerable to adversarial examples, which hinders their adoptions in safety- and security-critical scenarios. While some recent progress has been made in analyzing the robustness of feed-forward neural networks, the robustness analysis for stateful DL systems, such as recurrent neural networks (RNNs), still remains largely uncharted. In this paper, we propose MARBLE, a model-based approach for quantitative robustness analysis of real-world RNN-based DL systems. MARBLE builds a probabilistic model to compactly characterize the robustness of RNNs through abstraction. Furthermore, we propose a iterative refinement algorithm to derive a precise abstraction which enables accurate quantification of the robustness measures. We evaluate the effectiveness of MARBLE on both LSTM and GRU models trained separately with three popular natural language datasets. The results demonstrate that (1) our refinement algorithm is more efficient in deriving an accurate abstraction than the random strategy, and (2) MARBLE enables quantitative robustness analysis, in rendering better efficiency, accuracy, and scalability than state-of-the-art techniques.
The following two tables present the complete results of evaluation on the refinement algorithm, respectively on the LSTM models and GRU models. Here, we treat the random refinement strategy as a baseline and conduct a full comparison with it.
The evaluation metrics include the number of iterations ("#Iter") and time ("Time (s)") used to refine and yield an MDP, the number of states ("#State") and transitions ("#Transition") in the MDP, the mean squared error ("MSE_t") and the missing rate ("Miss") of the MDP when applied to estimate the robustness of new samples in the test dataset.
For each dataset, we highlight the average of the above six metrics with grey background. We can draw the following conclusion that holds for RNNs with different architectures. MARBLE is superior than the random-split refinement strategy, and can always deliver a more accurate abstract MDP model with smaller size and better generalization ability. MARBLE takes less number of iterations but costs slightly more time to explore a better refinement and accomplish the abstraction.
The table below shows the full results on the estimation efficiency ("Time (s)" columns) of MARBLE, as well as the attack success rate (ASR) when launching robustness guided adversarial attacks with the help of MARBLE. Specifically, we treat the random attack as a baseline, and present the results in columns "ASR_r". The ASR of attacking the least/most robust locations are presented in column "ASR_l"/"ASR_m". The increase rate compared with the baseline is marked with grey background. We also calculate the average value for each of the six RNN models, and highlight them with yellow background.
Compared with the random baseline, for the LSTM models, we increase the ASR by 71.26%, 214.24% and 120.85% respectively for the three dataset by attacking the locations with least robustness according to MARBLE. On contrary, the ASR drops significantly when attacking the most robust locations, with respective 64.44%, 26.81% and 36.40% decrease. Similar results can be drawn on the GRU models.
Compared with POPQORN, MARBLE offers better scalability for handling large and complicated models accepting longer inputs, is also efficient in calculating robustness, thus can be applied for real-time robustness monitoring of larger applications.
MARBLE calculates robustness measurements more accurately than POPQORN and can achieve up to 2 times attack success rate than the random strategy.