Identifiers Driven Analyses Examples

Examples of domain knowledge driven program analyses

Research statement: The explicit use of domain knowledge for analyzing programs leverages the classical program analyses and opens opportunities for completely new program analyses.

Below are several categories of conceptual program analyses together with a set of concrete examples of conceptual defects in the Java standard API. Conceptual defects originate from the inadequate representation of business domain knowledge in programs. They are not bugs per se but can lead to bugs, difficulties in maintaining the API clients, redundancy in the clients, or different hacks.

The examples were identified in an automatic manner by mapping the Java API to different ontologies. We have chosen parts of the Java standard API in order to demonstrate the need for domain knowledge driven analyses, the pervasiveness of logical defects, and to make our examples easier to follow. We are convinced that the Java API has a much higher quality than common programs and therefore the logical defects are even more pervasive in the practice.

Naming problems

A good naming of program entities is a central prerequisite to an easy understanding of the code. In the case of APIs bad naming can create confusions or difficulties in the use of the API. Good naming implies that the concepts implemented by program elements are reflected accurately and consistently throughout the program.

    • polysemy example: in the name of the class 'java.util.Date' and in some constructors, the word 'date' is an specific instance in time, whereas in parameters of other constructors of the same class the word 'date' denotes the day of the month between 1-31. This polysemy might create confusions in the use of class Date and thereby it makes the Java API easier to misuse.

    • synonymy example: the concept ellipsis is referenced under two names: 'ellipse' in the name of the class 'java.awt.geom.Ellipse2D', and 'oval' in the name of the method 'java.awt.Graphics.drawOval()'. This inconsistency lowers the homogeneity of the API and makes it more difficult to use.

    • ambiguous names: in the class java.awt.BorderLayout both the positioning constants and the components situated at the corresponding positions have the same name. Please remark that due to this ambiguity the Java programming team themselves used bad comments. Luckily, the instances of Component have the visibility level 'package' and thereby this issue does not disturb the Java API users.

Inheritance defects

The inheritance hierarchy should mirror the is-a relation between a sub-concept and a super-concept. By doing this, we make the APIs natural and easy to use in analogy to the domain knowledge of programmers.

    • inverted inheritance: in the collections framework of the Java API, the class 'java.util.LinkedList' implements the interface 'java.util.Queue' and thereby whenever objects of Queue are requested, we can use objects of LinkedList. This can lead to unexpected errors when the object is used both as a Queue (and thereby an ordering of elements is expected) or as a LinkedList (which allows random access to its elements) as shown on the right-hand side of the figure below. This is an example of the violation of the "Liskov's Substitution Principle".

Logical modularity

In order to be easy to understand and maintain programs should be built in a modular manner. The layered nature of the Java standard API is clearly suggested in the figure below (taken from here) -- we can notice for example that 'java.lang' belongs to the base libraries and 'java.awt' to UI toolkits. Furthermore, even if not shown in this figure, it is well known that the Swing API is built upon the Awt API.. Is this really so? Well, no :-(

    • 'java.lang' knows about 'java.awt': in the class 'java.lang.SecurityManager' we have the method 'checkAwtEventQueueAccess' and thereby is a logical dependency between 'java.lang' and 'java.awt'. This dependency is a violation (at the logical level) of the Java platform architecture shown in the above figure.

    • 'java.awt' knows about 'java.swing': in the class 'java.awt.Component', in the method 'doSwingSerialization' there is a dependency to the 'javax.swing' framework. The method 'doSwingSerialization' instantiates and invokes these classes through the Java reflexion mechanism and thereby the structural dependency is avoided (even the AWT developers wrote in a comment that their solution 'is a hack').

Please not that these violations of the architecture can not be discovered by structural analyses (there is no structural reference from 'java.lang' to 'java.awt' or from 'java.awt' to 'java.swing'). This is a typical example of an architecture that looks good in the documentation but that is implemented differently in the code. Traditionally, the logical violations of the architecture can beidentified only through manual code reviews.

Logical redundancy

Ideally, in order for an API to be concise, a domain concept should be implemented only once. In practice, due to different constraints or bad API design, domain concepts are implemented several times. Even worse, many times the implementations are not consistent with each other.Redundancy in the API leads to redundancy and heterogeneities in their clients.

    • redundant definition of the 'point' concept in 'java.awt': in 'java.awt' are three classes that define the 'point' concept: 'java.awt.Point', 'java.awt.geom.Point2D.Double', and 'java.awt.geom.Point2D.Float'. We consider these implementations to be redundant. In this case, the redundancy is due to the performance constraints.

    • redundant representation of the months of the year concepts in the Java API: the months are represented as static constants in two classes: 'java.util.Calendar', and 'sun.util.calendar.BaseCalendar'. Besides the redundancy per se, there were different constants choosen to implement the same month. By using for example the 'sun.util.calendar.BaseCalendar.MARCH' within the 'java.util' part we can produce unexpected results (e.g. set the 31th day of the February)

    • redundant representation of the 'NORTH' concept in 'javax.swing': in the class 'javax.swing.SwingConstants', 'NORTH' is a constant with type 'int', while in the class 'javax.swing.SpringLayout', 'NORTH' is a constant with type 'String'. Thereby there is a redundancy in the representation that requires developers to write additional code that only converts between the interpretation of integer values as positions and the interpretation of string values.

Conceptual coverage

Before we use a domain specific API, we should be informed about its conceptual coverage. Ideally, the API should provide direct access to all domain concepts that we need, and all relations between them. In the case when the API does not offer implementation for one of the concepts needed by us, we have to extend the API ourselves (or to migrate the application to another API, which is normally very difficult). We measure the domain coverage of an API by mapping it to a domain ontology that covers the same domain (for example, to an ontology from our knowledge repository).

    • collections framework: until Java 1.5, the collections framework did not contain implementation for 'queues'.

    • AWT: the AWT part of the Java library, does not cover typical graphical concepts like: 'tables', 'tool tips', 'tool bars', 'trees', or more advanced dialogs like 'print dialog', 'font dialog', etc.