Characteristic Studies of Loop Problems for Structural Test Generation via Symbolic Execution

PROJECT SUMMARY

Dynamic Symbolic Execution (DSE) is a state-ofthe- art test-generation approach that systematically explores program paths to generate high-covering tests. In DSE, the presence of loops (especially unbound loops) can cause an enormous or even infinite number of paths to be explored. There exist techniques (such as bounded iteration, heuristics, and summarization) that assist DSE in addressing loop problems. However, there exists no literature-survey or empirical work that shows the pervasiveness of loop problems or identifies challenges faced by these techniques on real-world open-source applications. To fill this gap, we provide comprehensive characteristic studies to guide future research on addressing loop problems for DSE. Our proposed study methodology starts with conducting a literature-survey study to investigate how technical problems such as loop problems compromise automated software-engineering tasks such as test generation, and which existing techniques are proposed to deal with such technical problems. Then the study methodology continues with conducting an empirical study of applying the existing techniques on realworld software applications sampled based on the literaturesurvey results and major open-source project hostings. This empirical study investigates the pervasiveness of the technical problems and how well existing techniques can address such problems among real-world software applications. Based on such study methodology, our two-phase characteristic studies identify that bounded iteration and heuristics are effective in addressing loop problems when used properly. Our studies further identify challenges faced by these techniques and provide guidelines for effectively addressing these challenges.

PEOPLE

Faculty

Tao Xie

Students

Xusheng Xiao Sihan Li

Collaborators

Nikolai Tillmann (Microsoft Research)

PUBLICATIONS

  1. Xusheng Xiao, Sihan Li, Tao Xie, and Nikolai Tillmann. Characteristic Studies of Loop Problems for Structural Test Generation via Symbolic Execution. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE 2013), Palo Alto, California, November 2013. Download: [PDF][BibTex]

LITARATURE-SURVEY STUDY

In our literature-survey study, we survey the literature from a comprehensive bibliography of papers on symbolic execution and tasks assisted by symbolic execution. The articles are published in a wide range of venues, including conferences in software engineering (e.g., ICSE, FSE, ISSTA), systems (e.g., ASPLOS, OSDI), security (e.g., CCS, Oakland, USENIX Security), software verification (e.g., CAV, TACAS, VMCAI), programming languages (e.g., POPL, PLDI, SAS, CC), and database (e.g., SIGMOD). Since the bibliography includes only a few articles published in 2012, we manually collect more articles in 2012 for study.

  • RQ1.1: Are loop problems a kind of major problems to compromise the effectiveness of symbolic execution in various tasks?
    • Symbolic execution has been applied for various tasks, such as test generation, debugging, security analysis, verification. The answer to this question can help understand which of these tasks would need to specifically address loop problems.
  • RQ1.2: What kinds of techniques are proposed to deal with loops, and how widely are the techniques used across various tasks assisted by symbolic execution?
    • In addition, we study the advantages and disadvantages of different kinds of techniques, providing guidelines on choosing techniques for various tasks.
  • Survey Results

EMPIRICAL STUDY

In our empirical study, we choose certain representative techniques (found from our literature-survey study) to apply on real-world applications. We sample subject applications from certain categories (data structures and parsers) that are found to typically have loop problems from the surveyed articles, and select subject applications randomly from major open-source project hostings including CodePlex and GitHub.

  • RQ2.1: How pervasive are IDLs in open-source applications? What are the distribution of nested loops?
    • We classify loops as Fixed-Iteration Loops (FILs) or Input Dependent Loops (IDLs), and identify nested loops that pose challenges for various techniques, such as loop summarization. The classification results of FILs and IDLs can show the pervasiveness of IDLs, and help understand the causes of IDLs. Since nested loops pose extra challenges, the distributions of nested loops provide insights on what kinds of applications may need manual assistance in dealing with nested loops.
  • RQ2.2: How well can existing techniques, such as bounded iteration and search-guiding heuristics, address the loop problems posed by IDLs on real-world applications?
    • As our literature-survey study identifies bounded iteration and search-guiding heuristics as the most widely used techniques across various tasks, we apply Pex, the state-of-theart DSE-based tool, with the mixed search strategy and Pex with bounded iteration (referred to as Pex-bounded) on the sampled software applications to explore IDLs. The results can show how well these techniques handle loop problems on realworld applications. Moreover, our studies identify challenges faced by existing techniques and propose guidelines on how to address these challenges.

MAJOR FINDINGS AND IMPLICATIONS