Mining Design Rules from Code

Project Overview

Motivation

Proper documentation forms an integral part of developing and maintaining software, especially given the increasing size of code bases and the prevalence of temporally and spatially dispersed teams. As code is written and the project develops, various design decisions are made, which impact both existing and future code. Documentation of design decisions and the rational behind those decisions is dependent on the developer and is often recorded or dispersed between a formal document and comments in the code itself. However, documentation is often not written, read, or updated, causing developers to depend largely on the code rather than existing documentation.

Project

Dr. Thomas LaToza and Sahar Mehrpour have researched mining design rules given an HTML code base, and have developed an independent tool, RulePad, which provides a textual representation of design rules that is synchronized with the project code base. Our project involves the combination of these two projects. That is, we are working to help create a tool that uses mixed human-AI authoring of code patterns in order document and update code, thus enabling developers to efficiently and accurately apply and maintain a unified set of design rules. There are an infinite number of patterns that could be found in a given code base; therefore, the crux of the problem lies in determining which patterns in the given code should be used to train the machine learning algorithm. Finding an excessive number of non-existent rules or not finding important or obvious rules would discourage developers from using the tool to document their code. Hence our focus this summer will be on mining design rules from a given code base in any language in order to facilitate active documentation and maintenance of code.