Chapter 6 Research Methodology
A Ph.D. Thesis by Andrew Le Gear
[Back to Home Page] [Previous Chapter] [Next Chapter]
“In the fields of observation chance favors only the prepared mind.”
-Louis Pasteur, lecture.
A secondary goal of this thesis is to evaluate the proposed approach. To do this research methodology must first be discussed. A research methodology is a strategy of inquiry which moves from the underlying philosophical assumptions of the researcher through empirical research design and data collection (Myers, 1997). Science offersthe modern researcher an arsenal of methodologies that may be used when evaluating a hypothesis. It is the purpose of this chapter to explain existing research methods, relate them to the approach adopted by this thesis and justify this approach to the reader.
The foundations of modern science rests upon what is known as “the scientific method.” Thepurpose of the method is to ensure repeatable, rigorous evaluation of hypotheses across science. Depending on specific context the method will vary, however, across many science disciplines it follows these common steps (O’Callaghan, 2005; Basili, 1996; Perry et al., 1997a):
1. Observation
In this initial step a scientist observes some interesting property of the world he wishes to investigate. This guides the formation of subsequent hypotheses and experimentation.
2. Hypothesis
Based on these observations a scientist will form a hypothesis. The hypothesis aims to explain the observations made.
3. Evaluation
Next an evaluation of the hypothesis is designed and implemented. Special care is needed when designing the evaluation.
4. Collection and Interpretation of data
Once the evaluation is performed the gathered data must be scrutinised and understood. Thedegree of certainty with which we can make statements regarding the data is known as validity (O’Brien et al., 2005).
5. Conclusion
Having assessed the validity and interpreted the data it is now possible to draw conclusions with respect to the hypothesis. The conclusions may supportor refute the hypothesis. Sometimes it may not be possible to draw a conclusions from an evaluation and the results may be deemed inconclusive.
6. Relating the conclusion to existing knowledge
In order to attain a greater understanding of the meaning of conclusions drawn, the hypothesis, the experimentsand the results should be positioned within existing literature.
7. Reporting and publishing results
This final step ensures that the knowledge gained is not lost. The publication of results will allow other researchers to confirm claims made and more importantly to build further hypotheses that extend the work.
All research is based on some underlying assumptions about what constitutes valid research evaluation (step 3 of scientific method) and which research methods are appropriate (Myers, 1997). Validity refers to the meaning of research results (O’Brien et al., 2005) and the degree to which one can make statements about those results from within the researchers adopted research method. Thus it specifically concerns steps 4 and 5 of the scientific method presented at the beginning of the chapter. Validity is described in three ways (Perry et al., 1997b):
External validity is the degree to which the conclusions of a study are applicable to the wider population. The larger and more representative the sample population used, the more applicable the results will be.
Internal validity is the certainty with which we can say that the known independent variables in the study are the only causes of what was observed in the dependant variables. Internal validity can be maintained by producing many streams of complimentary evidence (Kitchenham et al., 2005) that support the hypothesis being researched.
Construct validity refers to the degree to which the structure of the experiment affords the measurement of what the experimenter set out to measure. For example, the experiment may be valid, but the variable under scrutiny may not describe what is proposed in the hypothesis.
These descriptions of validity are often in conflict and difficult to balance. For example, to maintain a high level of external validity we may decide to perform studies of industrial programmers in their workplace as opposed to the laboratory. This makes the population more representative and is known as ecological validity (Kellogg, 2003; Perry et al., 1997b). However, in doing so, control over the variables of the experiment may not be possible, thus affecting internal validity.
Once the researcher has decided upon a guiding philosophy for his research, he must then choose from the array of research methods, ones that will be appropriate in evaluating his hypothesis. The types of research method can be categorised in many ways, however the most common distinction that is made is between qualitative and quantitative research methods (Myers, 1997).
Quantitative methods imply the ability to numerically measure facets of an experimental setting and includes methods such as:
•Surveys -A written or oral survey of questions can be presented to a population and statistical results inferred from the answers (Yip, 1995).
•Laboratory experiments -using a laboratory experiment the researcher can control independent variables that affect the object of the hypothesis under scrutiny. Usually a single variable can be adjusted by the researcher and the resulting effect it has can be observed. If theoutcome is in line with the prediction of the hypothesis then one can say that the experiment has produced evidence in support of the hypothesis. Evaluating the outcome of such experiments is often associated with statistical hypothesis testing approaches (Carew et al., 2005).
• Formal methods -An example of a formal method would be econometrics, which is a combination of mathematical economics, statistics, economic statistics and economic theory (Myers, 1997). Qualitative methods produce data of a textual nature, as opposed to the numerical output of quantitative methods. Qualitative methods can use a range of types of qualitative data (Often produced by technique that can also generate qualitative data) to evaluate the hypothesis under scrutiny, such as data produced by:
•Action research -This type of research is aimed at examining hypotheses that can be applied directly to an industrial setting and their benefit assessed. This is not to be confused with applied science. In the case of action research there is a real contribution back to the scientific community as well as industry, as a result of the application of the hypothesis (Myers, 1997).
•Case Study -This type of method is an empirical enquiry that investigates ones hypothesis in a real-life context, known as in-vivo (Basili, 1996). However, the boundaries between what is under evaluation and the context are not necessarily clear (Myers, 1997). Often with a case study there is only one, or a few data points (participants from a population). Therefore the data is not suited to a statistical evaluation. A richer insight into the context is achieved through qualitative data capture.
• Ethnography -In an ethnographic study the researcher immerses himself in the context of the hypothesis under study. This is often very time consuming, however it also provides a rich data set. A typical ethnographic study in an IT organisation may involve spending several months working as part of a software development team (Myers, 1997).
• Grounded theory -This research method suggests that a hypothesis to explain a certain phenomenon can emerge from an analysis of the gathered data rather than an a-priori hypothesis that guides the formation of the data gathering (Myers, 1997). Data used to realise these qualitative research methods can be gathered from various data sources including:
•Observation -The participant or object is simply observed with no interference apart from the study set up.
• Interviews -The participant is prompted or questioned to express their views and answers to various topics of relevance to the study.
• Questionnaires -These are similar to the quantitative surveys mentioned above. However using qualitative methods the questionnaire can also include essay style answers allowing the participant to express his or her opinion.
• Documents or texts -Documentation, emails, letters, memos, faxes, dictations and diaries can all be used as valid data sources.
• Researchers impressions -The researcher himself may draw conclusions from observation during the study before analysis.
•Think-aloud -The think-aloud method, pioneered by Erisson and Simon during the 1980’s (Ericsson and Simon, 1993), is implemented by having the participant of a study speak his thoughts out loud while performing the tasks of the experiment. Think-aloud is known to provide the richest insight into a persons mental state at a given moment in time (Russo et al., 1989) when carried out in line with Ericsson and Simon’s best practice guidelines (Ericsson and Simon, 1993).
Thus far research methods in general have been discussed. While generally applicable to the research question of this thesis, the more specific research culture of computer science must also be considered when arguing for a chosen set of research methods.
Existing reviews highlight a severe lack of research evaluation of any philosophical tradition in computer science (Glass et al., 2002; Segal et al., 2005). Only 14% of research papers surveyed in (Glass et al., 2002) were found to be evaluative. Even in a journal such as Empirical Software Engineering, whose focus is intended to be that of empirical studies, it was found that between the years of 1997 and 2003 only 53% of the papers within it were evaluative (Segal et al., 2005). Of the evaluative papers a hypothesis testing-based, quantitative approach dominated the evaluations. Furthermore, the evaluations tended to be laboratory-based, did not refer to other scientific disciplines and were not people focussed.
The culture of research in computer science, as highlighted in the previous section and by Basili in (Basili, 1996), indicate that computer science is an emerging discipline with an immature research model. Other, more established disciplines have seen a research scenario emerge where the research community divides into two groups -theorists and practitioners. In physics, for example, theoretical physicists create mathematical models of the universe, while experimental physicists test these models. Likewise in medicine, theorists and practitioners of their emerging science exist. However, their fundamental difference to computer science is that the essence of what they are studying is unchanging -The nature of the universe will always be examined by physicists and medical researchers will always be concerned with the human species. Computer scientists, on the other hand, not only attempt to improve the process that operates on an artifact in question, but the artifact itself can also be improved. Thus, in computer science the model of evaluation must be cognizant of both the process and the product (Basili, 1996). The closest scientific analogy to this scenario can be found in the manufacturing domain, where research is undertaken to improve the processes for producing products. However, similar to computer science, the product itself can also be improved. Therefore the role of the researcher in computer science is to understand the evolving nature of processes and products and the relationship between them (Basili, 1996).
Moreover, in evaluating technologies or techniques that aim to improve software development, the human will always be a key element in its operation, and therefore its evaluation. This complicates experimentation since different results will be obtained, depending upon the people involved (Basili, 1996). Research in cognitive sciences have developed a long established evaluation approach called the socio-cultural perspective that suggests that to evaluate a hypothesis involving people, studies should be undertaken using real activities, in real situations, in their natural environment (O’Brien et al., 2005). Advocates of the approach argue that the richness of context of such a setting cannot be replicated by any feasible laboratory controlled evaluation. From a computer science perspective this translates to the suggestion that all models created by computer science theorists should eventually be evaluated by computer science practitioners in software laboratories where real, commercial software is actually being developed (known as in-vivo) (Harrison, 2006). As it currently stands, however, there is a gross imbalance between the body of theoretical models produced by computer science theorists and a corresponding body of work that evaluates these models, in favour of the former (Buckley, 2002; Basili, 1996).
Proponents of purist positivist research philosophies often cite that in-vivo evaluation is not repeatable, therefore the results cannot be corroborated. Segal provides a retort that best counters this standpoint,
“An argument is often made against field studies is that they cannot be replicated
-but neither can a software software engineering activity in the real
world (one cannot dip one’s toes in the same river twice!). Validation of the
study cannot be based on the replication of the study but on the replication
of the interpretation: the question to ask is, would other researchers from
the same scientific cultural tradition as the original researcher(s) and given
the same data, come to the same conclusions?”
It should be noted that performing experiments in-vivo does not preclude the gathering of quantitative data. However, the degree to which once can draw conclusions from quantitative data gathered in an in-vivo experiment can be limited, since many subtle, immeasurable factors may be occurring external to those measurable factors. Such is the nature of human-based evaluations. Therefore, there is a convincing need for the use of qualitative data sources.
Take, for example, in a hypothetical evaluation where the performance of a programmer using a tool is investigated and where the user unexpectedly underperforms. Quantitative measures of time and productivity gathered will measure this underperformance and results can be reported on this data. However, later upon the gathering of data using a qualitative data source, such as an interview, we find that that particular participant had a headache that day, impeding his performance, thus highlighting that his underperformance had little to do with the process under evaluation. Nor was it an accurate reflection of the average person using the process under evaluation. Therefore, vital information would have been overlooked had both qualitative and quantitative methods been used. Also notice how both the interpretivist and positivist philosophies are complementing one another in this instance.
Furthermore, the key to a compelling evaluation is to provide a convincing argument in favour of ones hypothesis. To this end, mounting evidence should ideally be provided by many streams (Kitchenham et al., 2005). For example, a combination of quantitative measures, qualitative data sources, assessing both the product and process would evaluate a hypothesis from many angles. This is known as triangulation and is a means of creating a large body of evidence in support of ones hypothesis while also appeasing a range of research philosophies (Myers, 1997).
Deciding upon a research methodology depends primarily upon the research objectives. This thesis is attempting to perform an initial evaluation of a repeatable process for component encapsulation, that is useful to software engineers and is industrially applicable. These objectives immediately highlight certain requirements when choosing an appropriate methodology for the thesis:
•Industrial applicability requires an ecologically valid setting for evaluation.
•Investigating usefulness of the component encapsulation approach to programmers requires methods that can reveal the full complexity of human-computer interaction. Again, an ecologically valid setting would be advisable, however it also presents strong motivation for the usage of qualitative methods of evaluation.
• The outcome of the component encapsulation process is a software artifact, which is quantifiable. Therefore, quantitative measures in the form of software metrics would seem appropriate in this case when assessing the product. Complementary qualitative measures should also be used to buttress the findings. An important observation on the requirements cited is that the required research methods do not fall under a single research philosophy or method grouping. Immediately we see the opportunity for of triangulation, that was discussed in the previous section.
Some quantitative measures are used in an attempt to provide an objective evaluation of the product of our process:
•Software metrics used to assess the product of the process.
•Project data, such as length of time for a project or lines of code.
Assessing the process for its usefulness requires more intricate methods that can examine the full richness of complexity of context and human-computer interaction. The available qualitative methods can account for this complexity. Importantly the evaluation is carried out in-vivo (Basili, 1996), helping to preserve a high level of ecological validity (O’Brien et al., 2005). This in-vivo evaluation takes the form of several case studies. Qualitative data sources used in the evaluation include:
•Observation: The participant will be observed and video recorded. The result of this form of observation can then be analysed to gain insight in to the process.
• Diaries: The participant will produce a diary of his experience of the process.
• Note-taking: During any point of the case study, interesting information with respect to the study that becomes apparent can be recorded in the form of notes taken be the investigator.
• Think-aloud: During the participant’s actuation of the process the participant will be encouraged to speak his thoughts out loud. This data, it is expected, will provide a deep insight into the mental state and impressions of the participant during the process.
• Interviews: After the process has taken place, interviews will be used to further assess the process and to also assess the product components produced as a result of the process.
• Project documents: Existing documentation with respect to the subject system can be used to further help the assessment of both the process and product of the process.
From these streams of evidence a strong triangulation of evidence is built. Both the process and the product are evaluated using several research methods from both the quantitative and qualitative categories, with think-aloud data being the most used stream of qualitative evidence.
Several actions have been taken to raise the validity of the studies:
•All of the studies performed in the thesis are designed to have high external, ecological validity.This is due to the in-vivo nature of all the evaluations.
• A high level of internal validity is maintained by creating several streams of evidence through triangulation.
• Construct validity is kept to a high level by clearly identifying the attributes of the process and the product that lead to quality components. This has been discussed during the earlier literature review.
The appropriateness of these measures has also been confirmed in a pilot study
undertaken on Reconn-exion, found in (Le Gear and Buckley, 2005a).
[Back to Home Page] [Previous Chapter] [Next Chapter]
Component Reconn-exion by Andrew Le Gear 2006