Program Comprehension Research Overview

Motivation:

Programs simulate parts of the real world -- they respond to the actions of their users (domain experts) as such they would know about a particular situation of the real world. Besides knowledge about the business domain, in order to write and maintain programs, developers need to have knowledge about other domains such as programming technologies (e.g. GUIs), software architecture, computer science, or general knowledge. The availability of domain knowledge and its mapping on the code is a prerequisite for program understanding, the most expensive activity of software maintenance.

Even if the domain knowledge is of capital importance for understanding programs, most of the state of the art automatic program analysis techniques do not consider it, and thereby miss important information. Typical static program analyses analyze the structure of the code and require the engineers to manually interpret the results of the analyses in terms of their domain knowledge.

My research is focused on using domain knowledge as a first class citizen in program analyses. By systematically interpreting the programs from the point of view of the domain knowledge that they implement we lift the level of abstraction of current program analyses, such as:

    • enriching the analyses of the structural modularity by considering the logical (conceptual) modularity (WCRE'09),

    • enriching the analyses of software redundancy by considering the logical redundancy (ICPC '08, WCRE'06),

or define completely new analyses, such as:

    • assessing the domain appropriateness of domain specific APIs (ICPC '08, CSMR '07),

    • assessing the explicitness of the implementation of domain concepts in the code (ICPC '07),

    • assessing the quality of identifiers (WCRE '06),

For concrete examples of our analyses and results on applying them on the Java standard API, please look here.

Approach:

In order to systematically use the domain knowledge in program analysis, we need the following ingredients (WSR '09):

    • program abstraction to leave out the uninteresting details contained in the code,

    • an adequate semantic domain to represent the domain knowledge,

    • well defined interpretation to link the program abstraction to the semantic domain.

We realize the program abstraction by regarding programs as knowledge bases (ICPC '06) whose content is given by program identifiers and representation language by a sub-set of the programming language; we use domain ontologies as semantic domain; and we use the 'intentional interpretation' to map program entities on domain concepts that they implement. Thereby we define and recover the intentional meaning of programs (PhD Thesis) which is the starting point for performing more advanced domain knowledge driven program analyses.

In more details, depending on the analysis use-case, these ingredients are instantiated differently. For example, for analyzing the appropriateness of APIs the program abstraction represents only the public interfaces, the semantic domain represents an ontology of the domain of the API (CSMR '08), and the interpretation is the reference and definition of domain concepts in API.

We investigated also different ways to obtain adequate ontologies for program analysis by extracting domain knowledge from domain specific APIs (CSMR '08,STSM '08) -- for more details please see our knowledge repository.