Abstract
Deep learning (DL) has been applied in many applications. Meanwhile, the quality of DL systems is becoming a big concern, especially in safety-critical scenarios. To evaluate the quality of DL systems, a number of DL testing techniques such as coverage-guided testing (CGT) have been proposed. To generate test cases, a set of initial seed inputs are required. Existing testing techniques usually construct seed corpus by randomly selecting inputs from training or test dataset. Till now, there is no study on how initial seed inputs affect the performance of DL testing and how to construct an optimal one. To fill this gap, we conduct the first systematic study to evaluate the impact of seed selection strategies on DL testing. Specifically, considering three popular goals of DL testing (i.e., coverage, error detection, and robustness), we develop five seed selection strategies including three based on single-objective optimization (SOO) and two based on multi-objective optimization (MOO). We evaluate these strategies on 5 testing tools. Our results (9,800 runs of testing) demonstrate that the selection of initial seed inputs greatly affects the testing performance. Specifically, SOO-based seed selection can construct the best seed corpus that can boost the DL testing with respect to the specific testing goal but not for other goals. Instead, the MOO-based seed selection strategies can construct the seed corpus that achieves balanced improvement on multiple objectives. Based on the results of this paper, we suggest that researchers should carefully consider the selection of seed inputs when conducting DL testing-related research in the future.
Overview
Our code and experimental materials are put on https://github.com/AAA-Iris/Seed-Selection .
We will update the remaining results as soon as possible.