Improving software quality with programming patterns
Prior research shows that programming idioms and patterns are prevalent in source code of software systems. This thesis focuses on developing new models of source code for recovering programming patterns and use them for relevant software engineering tasks. In the beginning part of the thesis, we work towards that goal via a graph-based approach. We propose a graph-based representation of object usage call GROUM (Graph-based Object Usage Model). A GROUM abstracts any portion of code by object usages and represents those usages as a graph. That is, the nodes of that graph correspond to the function calls and control structures (e.g. if or while statements) involving in that usage, while the edges captures both the control and data dependencies between those nodes. Based on GROUM, we have developed several techniques to support different software engineering tasks. For example, we proposed GrouMiner, an usage pattern mining technique that recover the usage patterns of APIs and use them for detecting bugs. Those usage patterns are also used for recommending and completing code. GROUMs are also used to find similar code, recommend similar bug fixes, detecting recurring vulnerabilities, or identify cross-cutting concerns.
In the remaining part of the thesis, we propose a new approach to capture and utilize programming idioms and patterns. The core of this approach is a novel language model specially designed for source code, using several code-based factors like the local code context, the global concerns, or the pair-wise associations of of code elements. Unlike GROUM-based techniques, this model is able to model and recommend code that is infrequent in training codebase. It also captures most meaningful code elements (e.g. function calls, data types, variables, control structures), thus is applicable beyond API usages. We plan to develop and evaluate the usefulness of our language models in two software engineering problems: code completion and code authorship.Â