Empirical Studies

Capture & Replay vs Automated Testing for Android apps

A study has been carried out to evaluate and compare the effectiveness of different GUI testing approaches in the context of Android apps, in order to better understand the effectiveness of Capture and Replay (C&R) test-generation techniques in comparison with Automated Input Generation (AIG) techniques.

To this aim, 20 Computer Engineering students from the University of Naples “Federico II”.were enrolled in the Advanced Software Engineering course, held by one of the authors, for the Master degree in Computer Engineering.

The students, having university-level programming and software engineering skills, during the course received several lectures on testing techniques, testing automation, GUI testing, and C&R techniques. In addition, they received several lessons on Android programming (at the end of the course they had to develop an Android app) and were all Android phone users.

Two tasks were requested to each of them and for each of the four Android apps considered in the experiment:

UET : In the first task, each student had to produce test cases exploring the application under test while using the Capture and Replay tool called Robotium. The students had no previous knowledge of the applications under test. The Robotium tool automatically generates Android JUnit test cases corresponding to the recorded interactions.

IET : In the second task, the same students had to design further test cases with the aim to try to obtain the maximum obtainable coverage of the source code of the applications under test. They have the source code of the applications under test and the coverage achieved by the previously recorded tests.


Random Testing Termination Criteria

Random testing is a software testing technique through which programs are tested by generating and executing random inputs. Because of its unstructured nature, it is difficult to determine when to stop a random testing process. Faults may be missed if the process is stopped prematurely, and resources may be wasted if the process is run too long. In this article, we propose two promising termination criteria, “All Equivalent” (AEQ) and “All Included in One” (AIO), applicable to random testing. These criteria stop random testing once the process has reached a code-coverage-based saturation point after which additional testing effort is unlikely to provide additional effectiveness. We model and implement them in the context of a general random testing process composed of independent random testing sessions. Thirty-six experiments involving GUI testing and unit testing of Java applications have demonstrated that the AEQ criteria is generally able to stop the process when a code coverage equal or very near to the saturation level is reached, while AIO is able to stop the process earlier in cases it reaches the saturation level of coverage. In addition, the performance of the two criteria has been compared against other termination criteria adopted in the literature.

Developing and Evaluating Objective Termination Criteria for Random Testing

Porfirio Tramontana, Domenico Amalfitano, Nicola Amatucci, Atif M. Memon, Anna Rita Fasolino:

Developing and Evaluating Objective Termination Criteria for Random Testing. ACM Trans. Softw. Eng. Methodol. 28(3): 17:1-17:52 (2019)


Random testing is a software testing technique through which programs are tested by generating and executing random inputs. Because of its unstructured nature, it is difficult to determine when to stop a random testing process. Faults may be missed if the process is stopped prematurely, and resources may be wasted if the process is run too long. In this article, we propose two promising termination criteria, “All Equivalent” (AEQ) and “All Included in One” (AIO), applicable to random testing. These criteria stop random testing once the process has reached a code-coverage-based saturation point after which additional testing effort is unlikely to provide additional effectiveness. We model and implement them in the context of a general random testing process composed of independent random testing sessions. Thirty-six experiments involving GUI testing and unit testing of Java applications have demonstrated that the AEQ criteria is generally able to stop the process when a code coverage equal or very near to the saturation level is reached, while AIO is able to stop the process earlier in cases it reaches the saturation level of coverage. In addition, the performance of the two criteria has been compared against other termination criteria adopted in the literature.

Support Material : Experiment on GUI Testing of Java Applications


This experiment has been carried out on the following set of Java GUI based applications.

By clicking on the name of each application, you can download the source code of the tested application and the test harness that has been realized to test it.

A1

Remember Password

A2

Scientific Calculator

A3

Address Book

A4

Dietetics

A5

Minesweeper

A6

Simple Library

A7

Quiz

A8

BlackJack

A9

UPM

A10

JNotepad

A11

Simple Calculator

A12

Movie Catalog

A13

Bibtex Manager

A14

QRCode Generator

A15

Shopping List Manager

A16

HangMan

A17

Unit Converter

A18

Beagtex

In details, in the folder called source you can find the source code of the application under test, while in the folder called executable you can find the material needed to automatically test it.

In the folder called executable, you can find two batch files executable by windows shell, respectively called ripperciclico.bat and ripper.bat.

All the tests have been carried out in Windows 7 and 10 environments, but the batch files could be easily transformed in order to execute them in Unix environment.

To start a testing session, you have to execute the ripperciclico.bat shell file or, if this file does not exist, the ripper.bat file.

By executing ripperciclico.bat, 3 testing sessions will be carried out. In order to change the number of sessions, you can edit the number 3 in the following row of ripperciclio.bat.

for /L %%n in (1,1,3) do call ripper.bat %%n

Alternatively, it is possible to change the value of the parameter num_executions into the ripper.properties file.

The number of events constituting a testing session can be modified by editing the num_events parameter into the ripper.properties file.

During the execution of the tests, the mouse is directly driven by the testing tool, so the interaction with the computer should be avoided (running the experiment in a Virtual Machine is preferable).

The main outputs generated by the testing process are the following:

- coverage.em, metadata file generated by the Emma instrumentation tool

- coverage.ec, files corresponding to each of the executed testing sessions, in the coverage folder (you can set the name of this folder in the ripper.properties file).

Please refer to emma documentation to know how it is possible to visualize the obtained coverage.

- fired_events xml files, reporting information about the randomly executed user events;

- tempi.txt, reporting information about the execution time of the testing process;

- unhandled_exceptions.txt, reporting a log of crashes and other unhandled exceptions occurred during the testing process.

Support Material : Unit Testing of Java Libraries Supported by Randoop

The second experiment has been carried out on the following set of Java classes.

The complete set including all the resources needed to execute the testing process can be downloaded by this link (152Mb).

A1 A4J

A2 AspectJ

A3 Battlecry

A4 Calendar

A5 Classviewer

A6 Dcparseargs

A7 Diffi

A8 Falselight

A9 Http-utils

A10 Jargs

A11 Javax

A12 Jclo

A13 Jgaap

A14 Jipa

A15 Noen

A16 SaxPath

A17 Sugar

A18 Tullibee

The starting point of the process is the batch file starter6Randoop.bat. it has been developed and tested on Windows 7 and 10. In order to know the syntax and to start the testing process for all the 18 applications available, you can use the batchtest6.0.bat batch files that sequentially open series of testing sessions on the 18 considered applications.

The two main parameters of starter6Randoop are, respectively, the name of the jar of the application under test (the jar file has to be placed in the jars folder), the number of testing sessions to be executed and the time (in seconds) of each randoop test.

In this implementation the testing process continues indefinitely and have to be manually stopped by the user. It is possible to continue an interrupted testing process by means of the starter6RandoopRecever.bat batch file that needs as parameters the name of the folder with the already executed tests and the number of executed cycle of sessions.

For each application under test, in the jar folder there is, too, a text file containing the list of classes that have to be considered for coverage evaluation. The subfolder lib of the folder jars include libraries by which the tested applications depend.

The output of the testing processes are included in folders starting with the name of the application under test, followed by date and time of the start of the test. In this folder there are ec and es files reporting the emma coverage information corresponding to the executed tests and their cumulative coverage. The txt files provide summary of the obtained coverage. In particular, in the sessioncoverage.txt file there is a report of the trend of the cumulative coverage for each testing session. The unioncoverage.txt text file reports the trend of the cumulative coverage obtained by the union of all the executed testing sessions.


Memory Leaks of Android Applications

Comparing Testing Automation Techniques for Android Applications

A general framework for comparing automatic testing techniques of Android mobile apps

Domenico Amalfitano, Nicola Amatucci, Atif M. Memon, Porfirio Tramontana, Anna Rita Fasolino:

A general framework for comparing automatic testing techniques of Android mobile apps. J. Syst. Softw. 125: 322-343 (2017)


As an increasing number of new techniques are developed for quality assurance of Android applications (apps), there is a need to evaluate and empirically compare them. Researchers as well as practitioners will be able to use the results of such comparative studies to answer questions such as, “What technique should I use to test my app? ”Unfortunately, there is a severe lack of rigorous empirical studies on this subject. In this paper, for the first time, we present an empirical study comparing all existing fully au- tomatic “online”testing techniques developed for the Android platform. We do so by first reformulating each technique within the context of a general framework. We recognize the commonalities between the techniques to develop the framework. We then use the salient features of each technique to develop pa- rameters of the framework. The result is a general recasting of all existing approaches in a plug-in based formulation, allowing us to vary the parameters to create instances of each technique, and empirically evaluate them on a common set of subjects. Our results show that (1) the proposed general framework abstracts all the common characteristics of online testing techniques proposed in the literature, (2) it can be exploited to design experiments aimed at performing objective comparisons among different online testing approaches and (3) some parameters that we have identified influence the performance of the testing techniques.


Effects of Abbreviated vs Full-Word Identifier Names in Faults Fixing

Fixing Faults in C and Java Source Code: Abbreviated vs. Full-Word Identifier Names

Giuseppe Scanniello, Michele Risi, Porfirio Tramontana, Simone Romano:

Fixing Faults in C and Java Source Code: Abbreviated vs. Full-Word Identifier Names. ACM Trans. Softw. Eng. Methodol. 26(2): 6:1-6:43 (2017)


We carried out a family of controlled experiments to investigate whether the use of abbreviated identifier names, with respect to full-word identifier names, affects fault fixing in C and Java source code. This family consists of an original (or baseline) controlled experiment and three replications. We involved 100 participants with different backgrounds and experiences in total. Overall results suggested that there is no difference in terms of effort, effectiveness, and efficiency to fix faults, when source code contains either only abbreviated or only full-word identifier names. We also conducted a qualitative study to understand the values, beliefs, and assumptions that inform and shape fault fixing when identifier names are either abbreviated or full-word. We involved in this qualitative study six professional developers with 1--3 years of work experience. A number of insights emerged from this qualitative study and can be considered a useful complement to the quantitative results from our family of experiments. One of the most interesting insights is that developers, when working on source code with abbreviated identifier names, adopt a more methodical approach to identify and fix faults by extending their focus point and only in a few cases do they expand abbreviated identifiers.

Studying abbreviated vs. full-word identifier names when dealing with faults: an external replication

Porfirio Tramontana, Michele Risi, Giuseppe Scanniello:
Studying abbreviated vs. full-word identifier names when dealing with faults: an external replication. ESEM 2014: 64:1


Context: abbreviated and full-word identifier names in dealing with faults in source code. Goal: investigating whether the use of abbreviated identifier names affects the ability of novice professional software developers in identifying and fixing faults in Java code. Method: external replication. Results: the results of the original experiment (conducted on C code) were confirmed. Conclusions: the difference in using abbreviated and full-word identifiers is not statistically significant with respect to the time to complete a task and the number of faults identified and fixed.