Chapter 5 Software Reflexion Modelling
A Ph.D. Thesis by Andrew Le Gear
[Back to Home Page] [Previous Chapter] [Next Chapter]
“Any sufficiently advanced technology is indistinguishable from magic.”
-Arthur C. Clarke.
One of the most successful software understanding and design recovery techniques of recent years has been Reflexion Modelling (Murphy et al., 2001). Impressive improvements in time taken by software engineers in understanding systems have been reported while using the technique.
The details of the Software Reflexion Modelling process are explained in the next section, however the underlying principals of the technique are based on what are know as “collapsing strategies” (Stoermer et al., 2004). Given a software system, modelled as a dependency graph, we can decide to group certain elements of the graph together. This is the essence of a collapsing strategy. Figure 5.1 explains this in three steps:
1. We start with a dependency graph of a system (A).
2. Certain elements of this are chosen to be collapsed together (B)(Marked in black).
3. After the collapse is performed we have a slightly more abstracted graph (C). Many of the clustering techniques discussed in section 3.5.2 use collapsing strategies to achieve their goal. Importantly, a collapsing strategy alone is not a clustering technique. Clustering consists of both an analysis of the software followed by a decision to cluster based upon that analysis. For example, dominance clustering is a popular clustering technique (Girard and Koschke, 1997). First dominance analysis is performed and that determines dominating nodes in a call graph (the analysis). This is followed by a decision to cluster based upon the dominating nodes (collapsing strategy).
Collapsing strategies that are preceded by a manual analysis guided by human input have shown to be helpful in allowing software engineerings to recover the designs of software systems (Refl, 2005; Murphy and Notkin, 1997; Robillard and Murphy, 2002; Chung et al., 2005). The manual analysis performed is typically a mnemonic analysis (Refl, 2005; Murphy and Notkin, 1997; Robillard and Murphy, 2002; Chung et al., 2005). Mnemonics, in this context, usually refers to the naming conventions of software elements in a system (Refl, 2005). Software Reflexion Modelling (Murphy and Notkin, 1997), FEAT (Robillard and Murphy, 2002) and the CME (Chung et al., 2005) are all modern examples of this. The next section discusses the Software Reflexion Modelling process in greater detail.
Figure 5.1: Collapsing strategy in operation.
Software Reflexion modeling is a semi-automated, diagram-based, structural summarisation technique that programmers can use to aid their comprehension of particular software systems. Introduced by Murphy et al. (Murphy et al., 1995), the technique is primarily aimed towards aiding software understanding.
Reflexion Modelling follows a six step process, illustrated in figure 5.2:
1. The programmer who wishes to understand the subject system hypothesises a high-level conceptual model of the system.
2. The computer extracts, using a program analysis tool, a dependency graph of the subject system’s source code called the source model.
3. The programmer then creates a map which maps the elements of the source model onto individual nodes of the high-level model (collapsing strategy).
4. The computer then assesses the call relationships and data accesses in the source code to generate its own high-level model (called the reflexion model). This model shows the relationships between the source code elements mapped to differentnodes in the programmer’s high-level model. This allows comparisons between the computer’s model with the programmer’s model and the tool can report consistencies or inconsistencies in three ways:
•A dashed edge in the reflexion model represents dependencies between elements of the programmer’s high-level model that exist in the source model, but were not actually included in the high-level model.
•A dotted edge in the reflexion model represents a hypothesised dependency edge of the programmer’s high-level model that does not actually exist in the source model.
• A solid edge in the reflexion model represents a hypothesised edge of the programmer’s high-level model that was also found to exist in the source model.
5. By targeting and studying the inconsistencies highlighted by the reflexion model the programmer can either alter their hypothesised map or high-level model to produce a better recovered model of the system.
6. The previous two steps repeat until the software engineer is satisfied that the recovered model is correct. Many approaches to software understanding and design recovery have since used Software Reflexion modelling as a basis for their techniques (Tran et al., 2000; Kosche and Daniel, 2003; Hassan and Holt, 2004; Chung et al., 2005; Robillard and Murphy, 2002). The success of Reflexion as a software understanding technique can be attributed to it’s parallels with the state of the art in cognitive psychology (This is explored in detail later in this chapter) and the state of the art in component recovery (Koschke, 2000a).
Figure 5.2: The Software Reflexion Modelling Process
In Gail Murphy’s seminal paper on Reflexion Modelling she formally defines the Reflexion Modelling technique and applies it in an example to the NetBSD UNIX operating system (Murphy et al., 1995). In only a few hours the user of the technique was able to produce an accurately recovered model of the system.
A case study that better demonstrates the usefulness of Reflexion Modelling shortly followed this publication in (Murphy and Notkin, 1997). A Microsoft software engineer, with 10 yearsdevelopment experience, applied the technique in order to gain a sufficient understanding of the Microsoft Excel spreadsheet product with the aim of later performing experimental engineering. After one month of applying the technique, the software engineer said that he had gained a level of understanding of the system that would have normally taken two years usingother available approaches. While this reported improvement may seem dramatic, the result is consistent with subsequent Reflexion Modelling case studies (Murphy et al., 2001), including those undertaken by this thesis.
In (Kosche and Daniel, 2003) the original Reflexion Modelling technique is adapted to incorporate the use of hierarchies in the high-level model description. To demonstrate, the approach it is applied to two C compilers, using a generic compiler reference architecture as an initial high-level model. While the authors evaluation is weak, the case studies were useful in highlighting a number of points:
•Architectural recovery remains a difficult task.
•The Reflexion Modelling approach to architectural recovery is highly iterative.
•Much of the Reflexion Modelling process remains manual.
•Domain knowledge is necessary for the success of Reflexion Modelling, at least in the absence of other software understanding aids. This finding is also supported in (Christl et al., 2005).
In (Tran et al., 2000) a form of Reflexion modelling is used to repair the architecture of two open source systems. Unlike the original Reflexion modelling technique and similar to the approach used in (Kosche and Daniel, 2003), hierarchies are used to described the high level model, which they call the architectural level. The authors suggest four possible courses of action if the architecture is in need of repair:
1. Splitting: A high-level entity may be split into two separate nodes.
2. Kidnapping: Source model elements may have their mappings changed, thus moving them from one source model entity to another.
3. Change high level model dependencies: Unexpected dependencies may be placed in the high-level model, thus making them expected.
4. Change source model dependencies: The source code may be refactored to make is conform to architectural constraints. This is a highly invasive procedure compared with the other options.
The authors applied their technique to the Linux Kernel and the VIM text editor. They used the original, published architectures of both systems as a base and repaired the architecture of both systems. However, it is important to note that this repair occurred without changing the source code (i.e. -Splitting, Kidnapping and changes to the high-level model dependencies were made). This makes their evaluation purely a modelling exercise and does not examine repair or refactoring in the traditional sense.
In (Christl et al., 2005) the authors augment the Reflexion Modelling technique with two automatic clustering techniques to help aid the user in the mapping process. The two clustering techniques used are called MQ Attract and CountAttract and identify mappings based on measures of coupling and cohesion in the system. The techniques assume that a good set of modules in a system are those that have a low level of inter module coupling. Their technique was applied in a case study of the SHriMP graph visualisation tool (Storey and Muller, 1995) and, consistent with Reflexion-based case studies, showed good results.
Their approach is similar to the one proposed in this thesis in the sense that Reconnexion augments Reflexion Modelling with a further reengineering technique. However, in (Christl et al., 2005) the authors use only static analysis techniques. As identified in the previous chapter, dynamic analysis techniques can facilitate the software comprehension stage of component recovery by providing a link from the initial, behavioral understanding of a system down to the architectural and implementation details of that system. Software Reconnaissance and Reflexion Modelling are proposed as a means of facilitating this understanding in Reconn-exion.
Another dimension of information that the use of static analysis techniques will not address is historical information with respect to the software system being analysed (temporal analysis). The authors in (Hassan and Holt, 2004) derive historical information for software systems from source code control repositories and use it to augment the original Reflexion Modelling approach using what they call sticky notes. When iterating through the Reflexion Modelling process the authors suggest that extra information from the source control system, can help explain divergences in a model, could be useful to help answer important questions for the software developer such as:
•Who introduced the unexpected dependency?
•When was the unexpected dependency introduced?
•Why was the dependency introduced?
The authors evaluate their approach on a large, open source operating system called NetBSD to demonstrate how their approach could be applied.
In (Aldrich et al., 2002) the author proposed and applied a new architectural description language extension to Java called ArchJava. While the focus of the paper is not Reflexion Modelling the authors do use Reflexion modelling to help them reengineering a case study system to apply their architectural description language. In effect, Reflexion Modelling is used, in this instance, to help componentised the system. However, no changes to the Reflexion Modelling process are made to make it suit componentisation and no comments or evaluation are attributed to Reflexion Modelling as a means of componentisation in the publication.
A further application of Reflexion Modelling that has gained popularity is its use as a means of enforcing architectural decisions in an evolving software system (Tvedt et al., 2002; Eick et al., 2001; Sefika et al., 1996; Hochstein and Lindvall, 2003; Tvedt et al., 2004; Lindvall et al., 2002). Using Reflexion modelling in this fashion requires the software engineer to produce a high-level model at design time, as part of the normal forward engineering process. Then, as the parts of the system are implemented, they are mapped to the high-level model and the Reflexion model is created to reveal if dependencies exist that should not be there.
In (Tvedt et al., 2002) and (Lindvall et al., 2002) the authors use this approach during the reimplementation of a commercial software system, written in Java, called the Experience Management System. Using the approach as part of the development in the company the authors observed several benefits:
•The premise of the reimplementation project was to produce a better system. The concrete, as implemented view, provided by the Reflexion model, was useful in demonstrating to management that the project was succeeding.
•Developers were initially resistant to the new architecture, which was based upon the mediator pattern. Using Reflexion Modelling, programmers who did not adhere to the pattern could be identified immediately.
•Even with the teams best efforts, the desired architecture could not be fully achieved. Using Reflexion Modelling they were at the very least able to document these violations where they would normally have been overlooked.
• Catching architectural problems early ultimately led to an easier to understand and more evolvable software system and prevented the normally inevitable decay of the architecture (Eick et al., 2001).
A more detailed record of this study, by the authors is available in (Tvedt et al., 2004) and (Hochstein and Lindvall, 2003).
The component recovery process described in this thesis is as much an exercise in memory efficiency and recall as it is creating a new component, since the process is semi-automated and the recovered artifact is based on a system that already exists. Reflexion Modelling, and hence the tailored version of Reflexion Modelling that Reconn-exion employs, can be seen as software and process support for many memory retention and recall practises, as proposed and demonstrated by the field of cognitive psychology. The following is a review of the state of the art in cognitive psychology with respect to recovery, retention and recall practices, much of which is collated in (Brace and Roth, 2005), (Littleton et al., 2005) and (Fleming, 2006).
The human memory processing structure can be divided into three main workflows (Brace and Roth, 2005), as illustrated in figure 5.3:
1.Encoding involves taking information from the outside world via the senses and creating a representation for that piece of information so it can be stored internally in the brain.
2.Storage is the process of retaining the encoded information in the brain for long periods.
3.Retrieval involves getting coded information from where it is stored in memory.
Two types of retrieval process exist: Recognition and recall. The success of reflexion can be directly attributed to its support for encoding and retrieval. These are discussed in the following sections.
Figure 5.3: Adapted from (Brace and Roth, 2005).
5.2.3.1 Encoding
Encoding is responsible for taking in, preparing and placing information into storage in the brain. Information can be encoded to different depths (Craik and Lockhart, 1972). For example, learning a passage of a book by simply repeating it over and over would constitute a shallow depth of encoding, since the information is simply encoded using its physical characteristics (i.e. the sound of the passage in this case). This type of encoding is known as maintenance encoding and is not a very robust means of encoding information.
A deeper means of processing the information would be to encode it in terms of its meaning. Encoding in this fashion requires more effort, but tends to be far more lasting and useful in memory. It is known as semantic processing or elaborative rehearsal.
Reflexion can be seen as a form of elaborative rehearsal since the user is not only linking the system to appropriate abstractions (that may typically associate to domain concepts) but also learning the syntax of the system. This also begins to explain why a lack of domain knowledge can have a detrimental effect on the use of technique. Elaborative rehearsal has been shown to dramatically improve encoding (Craik and Tulving, 1975) and there is no reason to believe that it should be any different with the application of Reflexion.
A further interesting addendum to elaborative reasoning is what is known as the generation effect (Craik and Tulving, 1975). This states that an individual is more likely to remember information generated themselves rather than information presented to them by a third party. This helps to further explain the success of Reflexion where high level elements and mappings are created and named by the user, thus making it easier for the individual to remember information recovered from the system.
The encoding of large bodies of information is also more robust if spread over many separate sessions rather than confined to a single one. This is known as the spacing effect (Ebbinghaus, 1913). The technique described in this thesis places no restrictions on the software engineer to complete his task in a single sitting. The work may be saved and restored over the course of many sessions. Indeed, in many of the case studies reported in this thesis, this is exactly what happened. This may in part contribute to a more effective process.
For large bodies of information, organising parts of it into meaningful categories and hierarchies of categories has been shown to dramatically increase the amount of information that can be encoded, in a given time frame (Bousfield, 1953; Bower et al., 1969). The information encoded also tends to be far more lasting and easily retrievable from memory also. The subsequent retrieval tends to occur by category also. This is known (in cognitive psychology terms) as clustering. The nature of this categorisation is almost identically supported by the Reflexion process and tool support provided in this thesis, through the use of high-level models and mappings.
Evidence also exists that new information is not encoded as received. It has been shown that when we are presented with new information that we use our knowledge of past experiences to make sense of the of it. This has been termed effort after meaning (Bartlett, 1932), and describes schemas in memory that exist based upon our past experiences. New information is interpreted in terms of these schemata in memory. With respect to Reflexion Modelling the existing schemata is the application domain knowledge that that person has. In this instance, by application domain knowledge we mean the typical implementation of a programming solution for that domain. For example, while all compilers are implemented quite differently, they all may have many design concepts, specific to that domain in common. Knowing this constitutes application domain knowledge. Having no application domain knowledge is equivalent to having no schemata into which new information can fit. The presence of these schemata makes it easier to encode information. This further explains why the Reflexion Modelling technique can be more effectively used by those with extensive domain knowledge regarding the application and programming domains. A user of Reflexion Modelling who has domain knowledge will have an existing schema in memory, therefore the creation of an initial high-level model will be much easier.
5.2.3.2 Retrieval
Retrieval processes are responsible for fetching information from memory and bringing it to consciousness. A well encoded piece of information is useless if it cannot be effectively retrieved, however, it will also come as no surprise that the effectiveness of retrieval can be heavily dependant on how the information was previously encoded, as described in Tulving’s encoding specificity principle (Tulving, 1983, 1975). Information is retrieved from memory by accessing existing cues (routes to the piece of information in the brain). Information previously encoded using elaborate rehearsal provides far more cues than a shallowly encoded piece of information and is therefore far easier to retrieve.
The previous section showed how the Reflexion process facilitates the user to encode information using hierarchical, clustering, elaborate rehearsal, the spacing effect and the generation effect. This, combined with the fact that recognition provides stronger cues than recall, makes for a very deeply encoded piece of information with many cues for retrieval (Tulving, 1983, 1975). In practice this means that information discovered about the system using Reflexion is very easily accessible by that person at a later date. Furthermore the presence of domain knowledge also provides cues at discovery time, making it easier again to encode new information into an existing schema present due to domain knowledge.
5.2.3.3 Human Learning
Learning is the process of change as a result of experience (Littleton et al., 2005). Several models of the human learning process have been proposed in literature. Of most interest to us is a learning approach called category learning. Again this theory of learning is proposed by the field of cognitive psychology. A state of the art review of learning approaches proposed in cognitive psychology is collated in (Brace and Roth, 2005), (Littleton et al., 2005) and (Fleming, 2006). We have already touched on the subject when we spoke clustering in the memory process in section 5.2.3.1. Category learning is the learning that occurs when people come to understand that certain objects or entities belong together in particular categories (Littleton et al., 2005). The learning process is theorised as having these steps:
•A group of entities are observed by the individual.
•The individual forms a hypothesis that a portion of these belong together.
•The individual tests his hypothesis against his existing knowledge.
•The hypothesis is accepted, rejected indefinitely or put on hold until new knowledge causes a change in understanding.
We can see from this brief overview that the Reflexion Modelling process is almost an identical implementation of category learning. This direct support for what is thought to be the way in which human learning processes actually operate in part explains the success of the Reflexion Modelling technique (Littleton et al., 2005; Brace and Roth, 2005; Fleming, 2006). Furthermore, studies in category learning have convincingly shown that prior knowledge and past experiences play an integral part in the formation of categories (Murphy and Medin, 1985; Murphy and Allopenna, 1994; Kaplan and Murphy, 2000). In the same way we can explain why existing domain knowledge is of benefit to those who use the Reflexion Modelling technique.
Another form of learning model popular in cognitive psychology is constructivism (Huitt, 2003; Ausbel, 1968; Bruner, 1990; Seaman, 1999). An authoritative collation of constructivism research can be found in (Huitt, 2003). Constructivism states that an individual learner must actively “build” knowledge and skills and that information exists within these built constructs rather than in the external environment (Huitt, 2003; Bruner et al., 1956). This learning model suggests that the learners processing of stimuli from the environment produces adaptive behavior thus inducing learning. The Reflexion Modelling process supports this form of learning by continually create stimuli for the user. Each iteration of the Reflexion Model produces stimuli in the form of feedback regarding the correctness of his high-level model. If the stimuli challenges the user understanding of the environment (in this case the model of the system) an adaptive change is induced (i.e. the learner learns something new about the system). This adaptive change is eventually realised when the learner alters his high-level model for the following iteration.
5.2.3.4 Learning Preferences
Separate to the underlying learning process are the individual preferences of people with respect to learning. People exhibit different likes and dislikes when assimilating new information and integrating it with their understanding of the world. These are our learning preferences, over which we have little control, and form part of a learning style that includes a range of other influencing factor over which we do have control (i.e. diet or environment) (Fleming, 2006). One of the more popular learning preference models is known as VARK (Which stands for Visual, Aural, Read/write and Kinesthetic, the fours types of learning preference examined by the VARK assessment) and was first pioneered in 1992 (Fleming and Mills, 1992; Fleming, 2006). The model suggests that there are four types of learning modes that individuals may have a preference towards (or a mixture of these, making that person multimodal) (Fleming, 2006):
Visual This is the preference to take information in in the form or charts, graphs, symbols or hierarchies. It does not include movies or text based presentations.
Aural This is the preference to learn from information that is heard or spoken to you. For example, lectures, tapes, email or group discussions.
Read/Write This is a preference to learn from information presented to you in the form of words. This is typically reading or writing or both.
Kinesthetic This is the preference to “learn by doing.” People of this preference learn better from practical experience or dialogue rather than through direct tuition.
Statistically the mixture of preferences among people is diverse, with over 58% of people being multimodal. The process and tool support examined in this thesis is particularly suited to the multimodality of the general population. Three out of the four learning preferences are directly appealed to by the process. The kinesthetic preference is immediately satisfied through the hypothesis >test >refine nature of the process, strongly appealing to a learn-by-doing approach. The interpretation of results in the Reflexion models and the creation of maps is presented in a textual form to the user, thus satisfying the read/write preference. Finally, the visual preference is addressed through the presentation of reflexion models using a graphical output.
This broad appeal, allied with other theories on learning, encoding and retrieval, may explain the positive results and comments with respect to Reflexion and Reconnexion as a learning tool. Experience from this thesis would also seem to support this conjecture.
[Back to Home Page] [Previous Chapter] [Next Chapter]
Component Reconn-exion by Andrew Le Gear 2006