1. Human Aspects in Software Engineering

1.1 Detection of sentiments in Software Engineering texts

Automated sentiment analysis in software engineering textual artifacts has long been suffering from inaccuracies in those few tools available for the purpose. We conduct an in-depth qualitative study to identify the difficulties responsible for such low accuracy. The exposed difficulties are then carefully addressed in developing SentiStrength-SE, a tool for improved sentiment analysis especially designed for application in the software engineering domain.

1.2 Detection of emotions in Software Engineering texts

We develop the first sentiment analysis tool, DEVA, which is especially designed for software engineering text and also capable of capturing the emotional states excitement, stress, depression, and relaxation. We also create a ground- truth dataset containing 1,795 JIRA issue comments. From a quantitative evaluation using this dataset, DEVA is found to have more than 82% precision and more than 78% recall.

1.3 Applying Machine Learning techniques to detect emotions in Software Engineering texts

We develop the first Machine Learning based improved tool MarValous that is capable of capturing the emotional states excitement, stress, depression, and relaxation. We evaluate MarValous using a dataset containing 5,122 comments collected from JIRA and Stack Overflow. From a quantitative evaluation, MarValous is found to have substantially outperformed DEVA achieving more than 83% precision and more than 79% recall.


2. Software Security & Source Code Analysis

2.1 Analysis of security vulnerabilities in source code

Software security has drawn immense importance in the recent years. While efforts are expected in minimizing security vulnerabilities in source code, the developers’ practice of code cloning often cause multiplication of such vulnerabilities and program faults. Although previous studies examined the bug-proneness, stability, and changeability of clones against non- cloned code, the security aspects remained completely ignored. In this work, we present an in-depth study on the security vulnerabilities in different categories of code clones and non-cloned code. The comparative study along this new direction examines 8.7 million lines of code over 34 software systems, and derives results based on quantitative analysis with statistical significance. The findings from this work add to our comprehension of the characteristics and impacts of clones, which will be useful in clone-aware software development with improved software security.

2.2 Discovering bug-fix edit patterns by mining software repositories (ongoing project)

A deep understanding of the common patterns of bug-fixing changes is useful in several ways: (a) such knowledge can help developers in proactively avoiding coding patterns that lead to bugs and (b) bug-fixing patterns can be exploited in devising techniques for automatic program repair. We plan to study thousands of bug-fix patterns to identify bug-fix edit patterns and their nested locations in source code.