‎ MODA: Automated Mock Object Generation for Testing Database Applications

PROJECT SUMMARY

Database applications (i.e., software programs interacting with a database) are commonly used in mission-critical systems. To assure the quality of database applications, testing remains as the most commonly used technique. But it can be labor-intensive to manually write tests for a complex database application. Automated test generation techniques, such as Dynamic Symbolic Execution (DSE), are often used to reduce human efforts in testing database applications. However, such techniques share two common disadvantages: (1) during test generation, they require the associated databases to be available, which may not always be possible in reality, and (2) they are lack of the capability of generating database states in addition to program inputs. To address the preceding issues, we propose an approach that replaces a real database with a mock object. The mock object can mimic the real database in that it simulates operations on the database by performing corresponding operations on itself. With the presence of the mock object, our approach then uses a DSE-based test generation tool, called Pex, to generate tests that cover important portions of the database application. The results of our empirical studies on a real-world medical device software system and an open source software system show that our approach can, without producing false warnings, generate tests with higher code coverage.

PEOPLE

Faculty

Tao Xie (Principal Investigator)

Students

Kunal Taneja (PhD Student)

Collaborators

Yi Zhang (Center for Devices and Radiological Health, US Food and Drug Administration)

EXPERIMENTAL STUDIES

We conducted two empirical studies to evaluate the usefulness and effectiveness of our tool, in both ordinary software development and regulatory contexts. In these studies, a real-world medical device and a middle-scale web application were selected as subjects for test generation. Both MODA and a related test generation tool Pex were applied to these two subjects, and the performance of these tools were compared and evaluated from the following two perspectives:

RQ1: As compared to a state-of-the-art test generation tool Pex, how effectively can MODA help in improving code coverage while generating tests for database applications?

RQ2: As compared to existing mocking techniques, how effectively can MODA help in reducing the number of false warnings during test generation?

Medical Device Software System

We applied our approach on a point-of-care handheld medical assistant. The software system in the assistant is intended to be installed at smart hand-held devices (such as PDAs) and to communicate injury and treatment data with a remote database server. Due to confidentiality concerns, we cannot disclose the identity of this device, as well as its details. To focus our study on database interactions, we choose a component of the device for test generation. This component consists of a sequence of fixed GUI screens that can be categorized in two types: some screens retrieve information from the back-end database and populate it for a user; the rest assist the user in editing clinical data and uploading the resultant data to the back-end database. The size of the chosen component is about 100,000 lines of C# code (LOC), in which 62 SQL queries can be found. These 62 queries spread over 12 classes in the component, which totally consist of 2,994 LOC. Among these 62 queries, 54 of them are SELECT queries, while the rest 8 are either INSERT, UPDATE, or DELETE. Unfortunately, we could generate non-empty result sets for only 39 SELECT queries, leaving the rest 15 SELECT queries not covered. The reason of not all SELECT queries being covered can be ascribed to a bug that our approach detects in the component. While interpreting a SQL query in the component, our approach found illegal characters in the syntax of this query and threw a parsing exception. Since the execution of this problematic query lays on all execution paths leading to the 15 SELECT queries, our approach did not cover these queries. Our approach was able to achieve 91.9% branch coverage of the code under test.

This coverage measurement excludes parts of the component that are not executable due to the above-mentioned bug. In addition to realizing test generation, one advantage that distinguishes our approach from existing mocking techniques is the capability of detecting bugs associated with SQL queries. During our empirical study, our approach detected 3 bugs including the abovementioned one. Another bug that our approach detected indicates that a column name was referred to in lowercase by one of the SQL queries. However, the input database schema defines the column in uppercase. According to the SQL standard, column and table names in a database can be configured to either case sensitive or insensitive. Hence, a mismatched column name being referred by SQL queries could cause a system failure under certain database configurations. The last detected bug can be categorized as unsafe compositions of SQL queries. In particular, a method in the component takes as input a list and then appends each element of the list to the WHERE condition of a SELECT query as an AND clause. Since the input list can be empty, this method may construct SELECT queries with invalid WHERE conditions, such as SELECT * FROM X WHERE (X.Y = 1) AND (). None of these bugs can be detected by other mocking techniques, as these

Medical Device Software System

Our second study was on an open source library Odyssey, which implements a collection of windows presentation framework controls. In the study, two specific classes were selected from the library for test generation: class DAL that realizes the Data Access Layer for the library, and class BizContext that exchanges business objects with DAL. Within class DAL, 33 methods were found to have at least one SQL query. Class BizContext, on the other hand, does not contain any SQL query by itself, but indirectly interacts with the database by invoking methods from DAL. In particular, we found 20 methods from BizContext that call one or more DAL methods with SQL queries. These 53 methods, 33 from DAL and 20 from BizContext, consist of 1227 LOCs in total. As illustrated in the second row of Table 2, tests generated by MODA covered 89% branches of the methods under test. In contrast, tests generated by the Pex approach and by PMock can achieve only 77.2% and 89% branch coverage, espectively. In terms of false warning generation, running tests produced by MODA did feed all SQL queries in the code with valid, non-empty database states. PMock, on the other hand, did not generate any false warning for the DAL class. However, it produced false warnings for 8 methods in BizContext, 40% of methods tested in the class. The detailed results and the subject classes are available on our project webpage 3. The code in Figure 6 explains how PMock produces false warnings in the study. In Figure 6, method SaveCategory was extracted from the BizContext class. This particular method takes as input an Category-typed object category. If the Id member of category equals to 0, SaveCategory invokes method CreateCategory from the DAL class, which basically inserts a blank record into the Category table. The blank record is then updated by the UpdateCategory method. Moreover, the program throws an exception if either the insertion or the update operation fails. In the study, PMock did generate a test with category:Id = 0 for the code snippet in Figure 6. Running this test, however, presents a false warning: first, a record is inserted to the Category table, given that CreateCategory does not throw an exception afterwards; and then an exception is thrown by UpdateCategory, indicating that no record can be found in the table. The root cause to this false warnings lies in that PMock cannot memorize what records are maintained in the Category table after the insertion operation and before the update operation.

In summary, these empirical studies demonstrates the usefulness and effectiveness of MODA, as compared with Pex and PMock:

RQ1: MODA averagely gains 15% higher branch coverage than the Pex approach.

RQ2: MODA did not produce false warnings during the studies, while PMock did generate false warnings when facing consecutive SQL queries. Moreover, MODA supersedes the Pex approach and PMock in detecting errors pertained to SQL queries.