Chapter 3 Software Reengineering
A Ph.D. Thesis by Andrew Le Gear
[Back to Home Page] [Previous Chapter] [Next Chapter]
“It is the neglect of timely repair that makes rebuilding necessary.”
-Richard Whately
When software needs to evolve to prolong its lifetime, software development teams may have only three choices:
1. Purchase a new system.
2. Develop a new system.
3. Leverage the existing system.
The third choice is often the only feasible option, since the former two routes are generally too expensive (Rochester and Douglass, 1991). As a result, a large body of research has been produced in the areas of reengineering, maintaining and leveraging the existing systems.
Reengineering is a subset of software maintenance, specifically directed at leveraging existing systems. Several definitions for reengineering exist (Chikofsky and Cross II, 1990; Arnold, 1993; Corp., 1989). These definitions differ only in allowing or disallowing the behaviour of a system to be altered as a result of applying a reengineering technique. We will use the widely accepted Chikofsky and Cross (Chikofsky and Cross II, 1990) definition of reengineering:
“... the examination and alteration of a subject system to reconstitute it in a new form and the subsequent implementation of the new form.”
Thus, we can view reengineering as an extension of maintenance where the new form is an evolved version of the system (Tilley et al., 1994; Leintz and Swanson, 1980). This definition does not explicitly exclude alteration of the systems behaviour (Arnold, 1993), however it does remain ambiguous.
Many fields of research exist within the category of Software Reengineering and Maintenance:
•Software Comprehension (O’Brien and Buckley, 2001).
•Design Recovery (Biggerstaff, 1989)
– Architectural Recovery (Aldrich et al., 2002).
– Component Recovery (Koschke, 2000a).
•Refactoring and Restructuring (Chikofsky and Cross II, 1990).
– Language Transformations (Terekhov, 2000).
– Rearchitecting (Fowler et al., 1997).
– Wrapping (Aldrich et al., 2002).
•Data Analysis.
– Slicing (Weiser, 1982).
– Control flow analysis (Urschler, 1975).
– Normalisation (Connolly and Begg, 2004).
•Reuse Identification.
– Clone Detection (Baxter et al., 1998).
– Frequency Spectrum Analysis (Ball, 1999).
– Fan-in analysis (Fenton, 1991).
The following sections in this chapter focus only on the relevant topics from software reengineering and maintenance that are applicable to the objectives of this thesis, namely component recovery, reuse identification and dynamic and static analysis.
In reengineering some form of analysis of the software artifact must be undertaken. This analysis can derive information such as call relations, data flows or metrics of complexity, some of which may be necessary before reconstituting the system in a new form.
Techniques employed to analyse software can be broadly categorised as static and dynamic (Tip, 1995; Ritsch and Sneed, 1993). The difference between the two lies in the distinction between programs and processes (This is the operating system notion of a process.). A program is a static representation and is characterized by source code. A process is an instance of that program executing and is dynamic. The scenario is analogous to a recipe and baking a cake (O’Gorman, 2001); the recipe being the program and the baking of it being the process. Thus, static analysis will present information based upon the source representation of the system. A dynamic analysis will glean it’s information based on source execution at runtime. This runtime information is typically retrieved in the form of a coverage profile or program trace (Ball, 1999) using a form of software instrumentation (Wilde, 1998).
Deciding on which approach to employ is a matter of context. For example, consider control or data flow analysis. In the case of a static analysis the resulting data set can be program wide. But, this can be problematic where programs are large, yielding a massive data set after analysis.
Attempts to identify software components within legacy software, for the purpose of extraction or modernization are well documented in the reengineering literature with varying degrees of success (Riva and Deursen, 2001; Johnson, 2002; Cimitile and Visaggio, 1995; Girard and Koschke, 1997; Quilici, 1993; Eisenbarth, Koschke and Simon, 2001; Zhao et al., 2004). With the exception of a few solutions such as concept analysis based feature location (Koschke, 2004) described by Eisenbarth et al. (Eisenbarth, Koschke and Simon, 2001), most rely heavily upon static analyses and utilise little or no information gleaned from the dynamic execution of the software. However, static and dynamic approaches may be viewed as complementary when analysing software (Ball, 1999). Techniques that exclude dynamic analyses deny access to key information regarding (Ritsch and Sneed, 1993):
1. The software elements that are used and those that are not, for given execution
scenarios.
2. Performance information.
3. Relationships between code and particular business transactions.
4. Sequence of execution.
The first and third points are particularly relevant to this thesis’ core agenda -reengineering towards components -as this states that it is possible to relate source code to a prescribed execution scenario (often realised by a test case) and then to further relate an execution scenario to a business transaction that it instantiates. This offers the potential to identify implementations of behaviors of interest, during the targeting phase of the component recovery process.
As identified for component-based development in the previous chapter, software reuse is a core concern for a software engineer. Software reengineering and maintenance literature provides us with several means of identifying reuse in software systems. Several types of reuse exist. Based on the review of reuse undertaken here two broad categories of reuse appear to emerge:
1. Reuse internal to a system.
2. Reuse across systems.
The latter type of reuse is probably the most familiar type of reuse and realises the “Software Reuse” approach to software development (Nauer and Randell, 1968) defined as:
“the process of creating software from existing software rather than building software systems from scratch.” (Krueger, 1992)
This type of reuse can be be realised in any number of ways including, component deployment from repositories, the use of libraries in the form of header files or web services (Priéto-Diáz, 1991). The principals of component-based development are intended to foster this type of reuse (Cheesman and Daniels, 2001). Reuse internal to a system can exist in several ways, and are identifiable by their detection techniques, as illustrated in the following sections.
A software clone is duplicated code within that system (Baxter et al., 1998). Typically between 5% and 10% of a typical appliation consists of code clones (Baxter et al., 1998). Clones in a system tend to be viewed as a maintenance risk, since a change to a cloned piece of code may require changes to the other clones of that piece of code that are not immediately obvious to the maintainer. This is particularly true given that identifying clones in a system is not straightforward, since, subtle changes to the piece of code being cloned may have occured in the cloning process. As a result several algorithms have been prolifereated that attempt clone detection in software (Baxter et al., 1998; Baker, 1997; Johnson, 1994). These algorithms work off the source code text, or the abstract systax tree of the partially compiled program to identify its clones. It is also worth noting that the existence of a clone is not always bad and may indicate to the software engineering portions of code that are highly reusable, since a clone is the explicit reuse, by a programmer, of some implementation abstraction (Johnson, 1994). Clones will not be made apparant to the user if a dynamic analysis approach such as Software Reconnaissance is used. For example, if code that implements logging is cloned in a system and the logging feature is traced using a test case, only one example of the duplicated logging code will be captured.
Calls to a procedure from various parts of a system demonstrates another type of reuse. This is known as fan-in. A fan-in analysis determines the number of incoming calls for a procedure or class (Fenton, 1991). The fan-in analysis can also provide other valuable insights in to a system, including the identification of aspects, since procedures called from many diverse locations can indicate the presence of aspects (Marin et al.,2004). However, while fan-in is useful in identifying direct procedural reuse the reuse is not shown to be associated with an particular domain feature set as with the reuse perspective defined in this thesis.
Execution traces can be used to determine how often a software element is used for a given run of the program. Measuring this type of reuse is called frequency spectrum analysis (FSA)(Ball, 1999). This analysis can provide runtime reuse frequency information for particular elements or patterns of reuse for groups of elements. Calculation of the reuse perspective also relies on dynamically generated information.
int x;
int y;
int z;
x = 1;
z = 1;
y = x + z;
Figure 3.1: Code example 1.
Figure 3.2: A possible graph representation of code example 1 in figure 3.1.
A dependency graph is a graph representation of dependencies in a software system. This intermediate representation of the system is a convenient depiction of the source code that easily affords itself to analysis (Larsen and Harrold, 1996) and code optimization of programs (Ferrante and Warren, 1987). In the code example in figure 3.1 we have a number of statements and assignments. To help understand data flow one might decide to model the assignment and declarations in the program as with the graph in figure 3.2.
In the next code example is a program with an ‘if’ statement and a ‘while’ loop (figure 3.3). It is possible to model the control flow structures of this program in a graph for as shown in figure 3.4.
Early use of dependency graphs saw them used to help implement code optimizations and analyses such as program slicing (Ottenstein and Ottenstein, 1984; Ferrante and Warren, 1987). Ferrante and Ottenstein describe what they call the “Program Dependency Graph” which combines data flow and control flow for a program in a single graph.
To model programs written in object oriented languages, dependencies such as inheritance relationships may also be included. Take the code example in figure 3.5 where a three tier inheritance hierarchy exists. A dependency graph modelling this type of dependency can be seen in figure 3.6. Similarly, other constructs that introduce dependencies in object oriented languages such as polymorphism and the friend construct may be modeled in a graph representation.
(1) int x = 0;
(2) int y = 0;
(3) while (x == 0)
(4) {
(5) if(y > 10)
(6) {
(7) x = -1;
(8) }
(9) else
(10) {
(11) x++;
(12) }
(13) y++;
(14)}
(15)x = 0;
(16)y = 0;
Figure 3.3: Code example 1.
Figure 3.4: A possible graph representation of code example 2 in figure 3.3.
public class Animal { }
public class Dog extends Animal { }
public class Cat extends Animal { }
public class Greyhound extends Dog { }
Figure 3.5: Code example 1.
Figure 3.6: A possible graph representation of code example 3 in figure 3.5.
int procedure1()
{
procedure2(7);
procedure3("hello",3);
}
int procedure2(int x)
{
procedure3("hello again",x);
}
int procedure3(String str, int y)
{
}
Figure 3.7: Code example 4.
Another commonly derived dependency is the method or procedure calls made in a program. Take the code example in figure 3.7 and its corresponding graph representation in figure 3.8. The resulting dependency graph is known as a call graph (Fenton, 1991).
In (Larsen and Harrold, 1996) the authors describe a “System Dependency Graph” where the special dependency relations for object oriented software, described above, and the call relation dependencies are combined with the dependencies of the program dependency graph to form a large comprehensive dependency graph of the system. Their application of the graph is to enable slicing in object oriented software.
In this context the call graph can be seen as a subset of the system dependency graph. The analyses performed by the technique proposed in this thesis only require the call relations of a program. Therefore it is the call graph dependency graph representation that is used in this thesis as a basis for analyses.
Figure 3.8: A possible graph representation of code example 4 in figure 3.7.
Design recovery is a subset of reverse engineering (Chikofsky and Cross II, 1990) Chikofsky and Cross (Chikofsky and Cross II, 1990) in their taxonomy, define design recovery as:
“a subset of reverse engineering in which domain knowledge, external information, and deduction or fuzzy reasoning are added to the observations of the subject system to identify meaningful higher level abstractions beyond those obtained directly by examining the system itself”
Other descriptions of design recovery do exist (Stoemer et al., 2003; Dean and Chen, 2003; Sartipi et al., 2000; Malton and Schneider, 2001), but they all essentially capture similar basic concepts -that the implicit agenda behind design recovery is to help the programmer understand the system and its design.
Biggerstaff (Biggerstaff, 1989) first brought the term into the mainstream in 1989 with his accompanying tool DESIRE. Here, the inadequacies of source code alone in an understanding context are identified. Application domain, programming style, supplementary documentation are just a few factors external to the source code, that have an impact on the understanding of the source code (Shaft, 1995).
Design recovery can include elements of domain knowledge regarding the system, the system’s context, documentation supporting the system and input from an expert developer of the system. Core to this topic is the concept of a domain model. A domain model records the expectations of a programmer regarding the real-world situation the system is modelling, during an understanding process, and attempts to match these expectations with source code; hence introducing traceability from hypotheses to source code. An attempt at automation was made in Biggerstaff’s DESIRE tool (Biggerstaff, 1989). The tool is analyzed further in (Biggerstaff et al., 1993) where he identifies what is known as the concept assignment problem of matching expectations and hypotheses to source code programming implementations (Brooks, 1983). Where these source implementations are clichéd they are known as programming plans (Brooks, 1983).
Creating domain models automatically has proved difficult (Biggerstaff et al., 1993). Research in the area of plan detection (Quilici, 1993; Quilici et al., 1997; Quilici and Yang, 1996; Rich, 1984; Woods and Quilici, 1996), and pattern detection (O’Cinneide, 2001; O’Cinneide and Nixon, 1999, 2000, 2001; Heuzeroth et al., 2003), though worthwhile, and partially grounded in comprehension theory, has not yet reached a level of practical application. At present, the best application for automated design recovery through plan detection would seem to be in vertical domains where a far narrower range of plans and expectations would exist, thus making the solution space manageable (i.e. the coding alternatives for each plan) (Quilici et al., 1997).
Given that automating design recovery is not currently practical, semi-automated approaches are being investigated as viable solutions. In recent years, semi automated approaches, such as Reflexion Modeling, CME and FEAT have been used with very promising results (Kosche and Daniel, 2003; Murphy and Notkin, 1997; Murphy et al., 1995; Sartipi, 2001; Tran et al., 2000; Murphy et al., 2001; Walenstein, 2002; Chung et al., 2005; Robillard and Murphy, 2002; Lindvall et al., 2002). These processes follow these general steps (Some of these steps may be implicit in the use of the technique or appear to be merged to the user, however, they do exist.
):
1. Hypothesise categories and relationships between the hypothesised categories in the application under analysis.
2. Map parts of the application into these categories creating a hypothesised model.
3. Extract a concrete, lower level model of the application.
4. Compare the hypothesised model against the concrete model of the system.
5. Refine the results and repeat the process until satisfied.
Dynamic analysis techniques have also shown promise as a means of design recovery (Ritsch and Sneed, 1993; Heuzeroth et al., 2003; Komondoor and Horwitz, 2003; Rajlich and Wilde, 2002). Dynamic analysis offers the potential to remove the need for source code domain knowledge (Knowledge of the style of source code written for that domain. E.g. all compilers may have the same approximate design, therefore someone with domain knowledge of compiler development would expect certain modules to exist in the implementation.) prior to analysing the system. For example, dynamic analysis techniques for feature location, such as Software Reconnaissance or concept analysis (Wilde and Scully, 1995; Eisenbarth et al., 2003; Wong et al., 1999) use knowledge of the system’s execution with respect to test cases that exhibit certain business transactions to relate code to business function.
With repsect to step one (the encapsulation phase) of reengineering towards components, the most relevant reengineering and maintenance techniques are those that involve clustering. Clustering is a widely used technique of software maintenance and reengineering that identifies the contents of potential modules in a system and the cohesive interfaces between those modules. The contents of these modules are called clusters (Hutchens and Basili, 1985).
Clustering is often used to aid software comprehension, design recovery, component recovery and architectural reconstruction (Doval et al., 1999; Mitchell et al., 2002; Rennard, 2000; Ogando et al., 1994; Choi and Scacchi, 1990; Lindig and Snelting, 1997; Gall and Klösch, 1995; Patel et al., 1992; Valasareddi and Carver, 1998; Yeh et al., 1995; Kazman and Carrière, 1997; Murphy and Notkin, 1997). Component recovery and architectural recovery are highly related and yet are subtly different software analysis tasks, used for different purposes. An architectural recovery process will generally follow two steps (Koschke, 1999):
1. Identify the code that implements each component in a system.
2. Identify dependencies between the code of the components of the system.
This type of analysis is used to redocument systems, communicate their design and to help software engineerings understand unfamiliar systems. In contrast, the goal of component recovery is to identify individual components in a systems and extract them, possibly for reuse in other systems (Koschke, 2000b). To achieve this, a limited form of architectural recovery will occur, however, the global view that architectural recovery achieves is generally not required. We suggest that a component recovery process follows these generic steps (illustrated earlier in section 2.4), which is similar to architectural recovery:
1. Identify the code that implements the component of interest only.
2. Identify dependencies on the code of the component of interest only.
3. Conform with a component model by wrapping the component with a component wrapper. This is discussed in section 3.5.5.
It is important to note the final step, where a component wrapper is applied to achieve conformance with a component model. The majority of component recovery techniques described here recover components that conform to an older and simpler definition (table 2.1 tier 1, basic reuse) of a component and therefore the last step is often not necessary. Unless otherwise explicitly stated the review of clustering techniques for component recovery in this section does not include the final step. However, this does not pose a dilemma, since, if the first two steps are carried out to define a cohesive, reusable component, the application of a component wrapper becomes relatively trivial.
Approaches to clustering can be placed into three broad categories, based upon the
type of information they act upon (Koschke, 2000a):
•Dataflow-based approaches.
•Structure-based approaches.
•Domain-model-based techniques .
3.5.2.1 Dataflow-based Approaches
Dataflow-based approaches to clustering, cluster based upon data relationships in the source. The relationships examined can be data types (Doval et al., 1999), abstract data types (Ogando et al., 1994; Yeh et al., 1995) or simply the declared variables themselves (Hutchens and Basili, 1985; Gallagher and Lyle, 1991). The way in which we clustered parts of a code fragment to form valid encapsulations in the example in section 2.5.1.3 could be considered a form of data clustering.
Hutchens and Basili (Hutchens and Basili, 1985), describe a dataflow clustering technique, based upon whether data is passed, received, used or altered for two or more procedures. Their work demonstrates some of the earliest evaluation of clustering as a means of architectural recovery. They compared the structure recovered by their technique against descriptions produced by software engineers of the systems to determine the success or failure of the approach. Their results exhibited preliminary success for architectural recovery and set a precedent for evaluating future automated clustering techniques geared toward architectural recovery.
Unfortunately, their technique was limited by their inability to perform analysis of abstract data types and pointer usage. The work of Livadas and Johnson (Livadas and Johnson, 1994) overcame some of these shortcomings through the use of system dependency graphs (SDG’s).
Livadas and Johnson successfully implemented several clustering algorithms in (Livadas and Johnson, 1994), based upon data type usage identified in an SDG, to recover objects from source code which was not object-oriented. In (Gall and Klösch, 1995) the authors also implemented a clustering technique based on the analysis of data-types. The goal of their work was to identify abstractions in procedural code which could be transformed to object-oriented code, with the specific goal of reuse. Their approach is semi-automated, and human oriented. Their approach begins by extracting low level program representations such as data-flow diagrams, and call graphs. Using these, two types of component are identified algorithmically:
• Data store entities (DSE): that is, a clustering of source code that uses the same persistent data.
•Non-data store entities (NDSE): that is, a clustering of source code that use the same internal data. Components identified in this fashion are then compared against a domain model generated from human derived information such as requirements documents. Mappings are made manually between the domain concepts and the recovered components. In doing so, it becomes clearer what components are valid and what components are not.
The human orientation for the component recovery technique was ahead of its time. The semi-automated, human-oriented nature of the approach, are characteristics that are seen as best practice in component recovery literature (Koschke, 2000a). More recently, dataflow clustering techniques have been incorporated into aggregated approaches for component recovery (Ogando et al., 1994). These are discussed later in section 3.5.3.
3.5.2.2 Structure-Based Approaches
Structure-based approaches to clustering operate by analysing the structure of the system. Examples include (Girard and Koschke, 1997; Schwanke, 1991; Siff and Reps, 1997).
Some structure-based approaches will operate by applying a specific metric to the relations between elements in the system. For example, Schwanke (Schwanke, 1991) calculates a similarity measure between variables in procedures in a system to create a weighted relationship between them. Unfortunately, using this method on its own produced poor results. Results were improved when the author introduced AI-based tuning method to weight the metric, over time, in favour of the user’s clustering preferences. Other structure-based approaches that derived a weighted relation-ship between clusters in a system include (Chiricota et al., 2003) and (Muller et al., 1993).
Another style of structure-based approach exploits graph theory algorithms to a dependency graph of a systems to cluster elements together. One of the most common examples is dominance analysis (Cimitile and Visaggio, 1995; Girard and Koschke, 1997), which clusters procedures together in a similar fashion to the first example shown in section 2.5.1.3. Thus the technique suggests code clusters based on high encapsulation in the system. In (Cimitile and Visaggio, 1995) the authors effectively demonstrate how modules of software can be identified using dominance analysis with the goal of reuse in mind.
A more recent flavour of clustering has seen the use of Concept Analysis in clustering systems. Concept analysis is a mathematical technique used to analyse binary relations (Koschke, 2004). In recent years is has been successfully applied to the software engineering field (Snelting, 1998, 1996; Eisenbarth et al., 2003). One potential application of concept analysis is the identification of modules in source code. For example, in (Lindig and Snelting, 1997) the authors attempt to reengineer modules from legacy code by examining the binary relationship between procedures and global variables. Their results were mixed. An architectural recovery seemed only possible where an underlying structure existed in the first place. Two of the three case studies they performed their analysis on had undergone serious structural degradation as a result of years of ongoing maintenance and did not yield a recovered architecture as a result of their analysis. In (Siff and Reps, 1997) the authors also apply concept analysis to recover modules. This time the binary relation was placed over functions and their properties (i.e. -arguments, return values).
3.5.2.3 Domain-Model Based Approaches
The concept of a domain model was introduced in 3.5.1. A domain model, formed using the domain knowledge of a user, can be very effective in producing an accurate4 decomposition of a software system (Murphy and Notkin, 1997; Kazman and Carrière, 1997) particularly as a prelude to reuse (Patel et al., 1992). One of the most common domain model-based approaches to software clustering is the Reflexion Modelling technique (Murphy and Notkin, 1997; Murphy et al., 1995, 2001; Murphy, 1996, 2003). Reflexion Modelling forms part of the process of component recovery process proposed by this thesis and is described in greater detail in chapter 5. Other domain-model based approaches include FEAT (Robillard and Murphy, 2002) and the CME (Chung et al., 2005).
It has been shown in literature that clustering techniques generally perform poorly as a means of component or architectural recovery, when used in isolation (Koschke, 2000a; Kazman and Carrière, 1997). Furthermore, it has been suggested that future approaches to solving this problem should aggregate existing individual approaches, include the human more in decisions of the process, make use of dataflow information and place more emphasis on domain knowledge (Koschke, 1999). The approach proposed in this thesis is an aggregated process for component recover that satisfies these recommendations.
A good example of an aggregated approach is Ogando et al. (Ogando et al., 1994) who proposed an aggregated approach to recovering objects from source code. Their approach uses a combination of bottom-up and top-down understanding to achieve object recovery. From a top down stand point, objects are identified using two techniques:
•Routines are grouped together based on what global variables they use. For example, if a single global variable is used by four procedures then they are grouped together.
•User defined data types and the routines that use them are grouped together. These clustering techniques provide an initial architectural recovery of the system, thus facilitating understanding from the top down. From a bottom up perspective a human-oriented grouping of subcomponents is performed. For example, sometimes the automated clustering techniques used will suggest that a routine will belong to many, different objects. These conflicts are resolved in a bottom-up, semi-automated fashion by presenting them to the user. The type of domain knowledge used was reported to be mainly derived from the naming conventions in the code.
In (Girard and Koschke, 1997) a framework for component recovery is proposed that uses dominance analysis as the primary technique and combines it with two dataflow clustering techniques and another graph-based structural clustering technique.
1. First all mutually recursive routines are clustered.
2. Then each global variable and the procedures that use them are clustered together. These are called abstract state encapsulations (ASE).
3. Them each user defined data type, and the procedures that use them, are clustered together. These are called abstract data types (ADT).
4. Finally the dominance analysis is performed on the collapsed call graph to yield further component suggestions. The results of the authors’ studies showed a marked improvement upon simply using dominance analysis alone.
Two aggregated approaches to architectural/component recovery are described in (Koschke, 1999) and (Kazman and Carrière, 1997) where over fifteen clustering techniques are placed at the user’s disposal to apply at his discretion. These approaches demonstrate the most comprehensive solutions to date. Importantly, both approaches incorporate domain knowledge input from the user, which increased the effectiveness of their solutions. The ultimate goal of this thesis would not be to replace techniques like these, rather to evaluate our technique with an eventual view to integrating it into larger aggregated processes.
In this thesis componentisation refers to techniques used to convert entire systems to a component based implementation, hence it is more similar to architectural recovery than component recovery. In a componentisation process the improved encapsulation mechanisms are applied to no-component-based parts of a system similar to what was described in section 2.5.3. An early componentisation process was described in (Choi and Scacchi, 1990). Using their module interconnection language, NuMIL, the authors describe their suggested process for augmenting a program with, what would be described in modern terms as component rchitecture description. The modules that they describe, however, are not consistent with the definition of component used in this thesis.
Another componentisation approach that is related to the recovery of components is from Aldrich et. al’s application of the ArchJava component based software development language to a legacy system (Aldrich et al., 2002). Reflexion Modelling is used to explicitly and accurately identify component boundaries in the system before applying the ArchJava language to it (see chapter 5). Though successful in their goals of applying a component language extension to an existing system, it does not explore the objective of identifying individual components with the goal of reuse in mind.
P. D. Johnson in (Johnson, 2002) describes another componentisation approach using black-box reengineering. Black-box reengineering is any reengineering approach that only requires the maintainer to understand the system down to the functionality level and not the detail of implementation (Understanding to the implementation level during reengineering is known as white-box reengineering). A good example of black-box reengineering techniques are the many feature location approaches that exist (Where a feature is any operation that produced a result of observable value (Eisenbarth et al., 2003).) (Eisenbarth et al., 2003; Wilde and Scully, 1995; Zhao et al., 2004; Wong et al., 1999). These techniques often do not require understanding of the implementation to locate source code responsible for implementing a feature. Removing the requirement for detailed understanding of implementation details, of course, presents time saving benefits at the comprehension stage during a “reengineering to components” process. P. D. Johnson’s process follow three steps:
1. Identify business components: Apply a chosen technique that identifies components in code.
2. Create wrapper components: Supplement the code chosen to be recovered as a component with wrapping code so that it may conform to the definition of a component (Bergey et al., 2000). This is a necessary step when reengineering towards components and is described in the next section.
3. Deploy wrapper components: Use the recovered components in a system.
In the case of P. D. Johnson’s process he only considers deployment in the existing system to replace the same piece of the system that was chosen to be wrapped as a component. Importantly, this process is only a framework process for componentisation, and leaves many of the details of the precise process steps and what tools and techniques to use to achieve these specific steps, up to the discretion of the user. The process proposed by this thesis partially fits into this framework by describing in detail a set of steps that can be used to fulfil step one (component encapsulation) of componentisation. Also, it’s crucial that the differences between what is proposed by this thesis and the above componentisation process are understood; Reconn-exion implements targeted component recovery where individual components are chosen by the software engineer and encapsulated. Componentisation is a process that is applied to every component in a system.
For this thesis, only the first step is within the scope, i.e. -Only the encapsulation of a new component is of concern, not it’s alteration to conform with a component model or integration in a new system. However, for completeness, step two is considered briefly in the next section.
3.5.5 Component Wrappers
Wrapping is a mechanism by which legacy source code may be supplemented to modernise the system to conform with new development paradigms (Bergey et al., 2000). By legacy system we mean that it meets the following requirements (Juric et al., 2000):
•Has existing code.
•Is currently useful.
•Is currently used.
•Does not conform to the component model that we are applying the wrapping
technique for.
Due to the recent appearance of component-based development, few legacy systems or the software assets that constitute them, conform to the requirements of existing component models and frameworks (Interestingly the legacy source code does conform to a model of sorts -the operating system -that enforces constraints that today seem ubiquitous, such as process management, memory management and file management.). Therefore, before source code from a legacy system can be considered fully reengineered towards components, it must first be amended so as to achieve conformance with the component model and framework to which it will be applied. This is known as component wrapping (Comella-Dorda et al., 2000).
A simple process for wrapping legacy system as a JavaBean, for example, follows these three steps (Comella-Dorda et al., 2000):
1. Modularise by identifying the component’s code and distinct interfaces in the legacy system.
2. By identifying the interfaces the points of contact to the remainder of the system are identified.
3. Now the sufficient information necessary about the component is present to implement a wrapper bean for each component.
A more generic approach perhaps would be to apply a component wrapper using the language independent component model. In this model a state of the art architectural description language that has been extended to allow the definition of components, such as xADL (Galvin et al., 2004) could be used. An example of this can be found in (Le Gear et al., 2004).
A generic approach using this technology would follow these steps:
1. Identify a portion of legacy source to wrap as a component.
2. Identify its interface boundaries.
3. Apply the mark-up languages appropriately to describe a xADL component type.
Component wrapping is a necessary step when reengineering towards components but it is not a core contribution of this paper and is not discussed further in this thesis.
[Back to Home Page] [Previous Chapter] [Next Chapter]
Component Reconn-exion by Andrew Le Gear 2006