Authors: Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk
Mutation testing has been widely accepted as an approach to guide test case generation or to assess the effectiveness of existing test suites. Empirical studies have shown that mutants are representative of real faults; yet they also indicated a clear need for better, possibly customized, mutation operators and strategies. While some recent papers have tried to devise domain-specific or general purpose mutator operators by manually analyzing real faults, such an activity is effort- (and error-) prone and does not deal with an important practical question as to how to really mutate a given source code element. We propose a novel approach to automatically learn mutants from faults in real programs. First, our approach processes bug fixing changes using fine-grained differencing, code abstraction, and change clustering. Then, it learns mutation models using a deep learning strategy. We have trained and evaluated our technique on a set of ∼787K bugs mined from GitHub. Starting from code fixed by developers in the context of a bug-fix, our empirical evaluation showed that our models are able to predict mutants that resemble original fixed bugs in between 9% and 45% of the cases (depending on the model). Moreover, over 98% of the automatically generated mutants are lexically and syntactically correct.
The Figure above shows an overview of our approach, where the black boxes represent the main phases, the arrows indicate data flows, and the dashed arrows indicate dependencies on external tools or data. We begin by mining bug-fixing commits from thousands of GitHub repositories using GitHubArchive. From the bug-fixes, we extract method-level pairs of buggy and corresponding fixed code that we call transformation pairs (TPs). TPs represent the examples we use to learn how to mutate code from bug-fixes (fixed → buggy). We rely on GumTree to extract a list of edit actions (A) performed between the buggy and fixed code. Then, we use a Java Lexer and Parser to abstract the source code of the TPs into a representation that is more suitable for learning. During the abstraction, we keep frequent identifiers and literals we call idioms within the representation. The output of this phase is the set of abstracted TPs and their corresponding mapping M which allows to reconstruct the original source code. Next, we generate different datasets of TPs. Finally, for each set of TPs we use an encoder-decoder model to learn how to transform a fixed piece of code into the corresponding buggy version. The trained models can be used to generate new mutants which are similar to real bugs.