alattin_ase2009

Alattin: Mining Alternative Patterns for Detecting Neglected Conditions

PROJECT SUMMARY

To improve software quality, static or dynamic verification tools accept programming rules as input and detect their violations in software as defects. As these programming rules are often not well documented in practice, previous work developed various approaches that mine programming rules as frequent patterns from program source code. Then these approaches use static defect-detection techniques to detect violations in source code under analysis. These existing approaches often produce many false positives due to various factors. To reduce false positives produced by these mining approaches, we develop a novel approach, called Alattin, that includes a new mining algorithm and a technique that detects neglected conditions based on our mining algorithm. Our new mining algorithm mines alternative patterns in example form "P1 or P2", where P1 and P2 are alternative rules such as condition checks on method arguments or return values related to the same API. We conduct two evaluations to show the effectiveness of our Alattin approach. Our evaluation results show that (1) alternative patterns reach more than 40% of all mined patterns for APIs provided by six open source libraries; (2) the mining of alternative patterns help reduce nearly 28% of false positives among detected violations..

The paper accepted as ASE 2009 can be found as an attachment below.

PEOPLE

Faculty

Tao Xie (Principal Investigator)

Graduate Students

Suresh Thummalapenta (PhD Student)

EMPIRICAL RESULTS

Mined Patterns Sheet Format

    • PID : Pattern ID. As each pattern can have multiple alternatives, we show the same pattern ID.
    • API Name: Shows the name of the API method for which the rule is associated with.
    • Pattern: Gives an alternative of the pattern. Multiple alternatives are shown in multiple rows with the same PID.
    • SUP Val: Shows the support value assigned by the frequent subsequence miner for the pattern. For infrequent alternatives, SUP value can be seen as quite low such as 0.02 or something.
    • ABS Val: Support values computed by the ImMiner for the infrequent alternatives. ABS Val is shown as zero for frequent alternatives
    • Category: Manually classified category through documentation or source code. Possible values are Rule, Partial Rule, and False Positive.
    • Type (Single check, Balanced, Imbalanced): Shows the further classification of rules and partial rules.
    • Comments: Additional comments regarding the pattern.

Detected Violations Sheet Format

    • PID: Pattern ID that used to detect the violation.
    • Filename: Name of the source file.
    • Method: Name of the violated method in the preceding source file.
    • Violated API: The API whose properties are violated.
    • Violated Pattern: Actual pattern used to detect the violation.
    • Missing Condition: Condition check inside the pattern that is not existing in the code example.
    • Support: Support for the associated pattern. The value ranges between 0 and 1.
    • Violation Type: Manually identified category of the violation. Possible values are Defect and False Positive.
    • Comments: Any comments that describe the reasons for selecting the described violation category.

The patterns mined for each subject application along with the violations detected in each mode (Mode 1, Mode 2, Mode 3) are available in the attachments.

SPONSORS

Army Research Office Award Program (09/08/2008-08/30/2011)

National Science Foundation Award CNS-0720641, Computer Systems Research (CSR) Program (08/01/2007-07/31/2008)

For any details, please contact