With the recent fast progress in deep learning (DL) libraries, it has become more intuitive for developers to develop complex DL software by directly leveraging different operators from the libraries. Given that bugs and faults in the DL libraries can lead to severe outcomes in DL software, the quality assurance on DL libraries becomes urgent industrial demand. However, it’s not still not very clear how the existing approaches perform in detecting bugs of different DL libraries regarding different task domains and to what extent. To bridge this gap, we first conduct an empirical study on four representative and state-of-the-art DL library testing approaches in this paper. Our empirical study results reveal that the effectiveness of existing approaches is limited in specific task domains. We also find that the test inputs generated by these approaches usually lack diversity, with a high false-positive rate (up to 36%).
To address these issues, we propose a guided differential fuzzing approach based on generation, namely, Gandalf. To generate testing inputs across diverse task domains effectively, Gandalf adopts the context-free grammar to ensure validity and a Deep Q-Network to achieve such diversity. 15 metamorphic relations are carefully designed to reduce false positives. We evaluate the effectiveness of Gandalf on nine versions of three representative DL libraries, covering 309 operators from computer vision, natural language processing, and automated speech recognition. The evaluation results demonstrate that Gandalf can effectively and efficiently generate diverse test inputs. Meanwhile, Gandalf successfully detects five categories of bugs with only 3.1% false-positive rates. We report all 49 new unique bugs found during the evaluation to the DL libraries’ developers, and most of these bugs have been confirmed. Details about our empirical study and evaluation results are available on our project website.
We implement our approach as an open-source tool which could be found at: https://github.com/Gandalf401/Gandalf.