JPF and Google Summer of Code 2013

 JPF is participating in the Google Summer of Code (GSoC) 2013. If you are new to the GSoC program, this is an annual event where Google sponsors students to work on selected open source projects, each student being supported by an experienced mentor. Projects have about 3 month scope, carry a relatively low administrative overhead, can be done remotely, and are generally fun. You also get a cool t-shirt, so what's not to like.

Interested Students - Contact Us

If you have any questions or suggestions regarding JPF and GSoC, email us at <jpf.gsoc [at] gmail.com>. Please be sure to describe your interests and background. The more we know about you, the better we will be able to answer any questions you may have about JPF and/or its potential projects. If you are interested in a project that is not listed here but is relevant to JPF, we would love to hear about it. Join our IRC channel #jpf on freenode to engage in a discussion about all things JPF.

Timeline

  • 04/08: JPF accepted as a mentoring org.
  • 04/22 - 05/03: student applications
  • 05/24: mentoring org student selection deadline
  • 05/27: announcement of accepted students
  • 06/17 - 09/23: project work

Required Skills

JPF is written in Java, and it analyzes Java bytecode. The minimum skill required is to be familiar with Java and having some development experience with Java--class projects or industry experience. At a minimum you should know there is more to it than just the language - it's the language, the libraries and the virtual machine/bytecodes. Not all projects require all levels though, please look at the project descriptions to find out which parts are more important.

JPF is a software verification tool. It is a customizable virtual machine that enables the development of various verification algorithms. It will be to your advantage if you are familiar with formal methods, software testing, or model checking. However, JPF is where research meets development, so for many projects it is not a show stopper. We are looking for students who are motivated, bright, willing to learn, and love to code.

JPF is a fairly complex system. The first step to start is to get JPF running and configured. This in itself can be a steep learning curve. It also helps if you already know what listeners, bytecode factories and native peers are, but no worries - the mentors will help you there. One thing you have to look at, but what is now surprisingly simple is how to set up JPF projects.

Application for Students

You will need to submit a proposal to Google once they open the student application phase (04/22 - 05/03). Please check the  details about the process to see how the whole procedure works.

Beginners Projects

Due to popularity of beginner projects last year, we suggest several topics that are suitable for JPF novices:

(A) Generate JPF Documentation: apply and unify jpf-autodoc-options and jpf-autodoc-types to generate and check documentation for existing jpf projects.

(B) Analyze JPF: this project is in a realm of "analyze the analyzer". Possible analyses could include collecting various code metrics (e.g., coupling), coverage for existing test suites, or application of static analysis tools to JPF. Apart from such general metrics, there are also JPF specific questions of interest:

   (1) global singeltons - JPFs classes can be roughly divided into 3 layers: the top layer with global driver/mediator objects such as the JPF, Search and VM instances, a middle layer that especially includes the *Info classes (ThreadInfo, ElementInfo etc.) and their containers (Heap, Statics etc.), and a low level layer for things such as Fields and Instructions. Due to JPFs legacy codebase there are a number of methods in low level classes that have to resort to static top level getters such as VM.getVM(), which essentially turns those into singletons that prevent multiple JPF instances within the same Java process (either concurrently or nested). Identifying all such singletons and their users would help to clean up the design and overcome this restriction.

   (2) back links - there are a number of classes that keep direct references of their respective containers/owners, such as Instructions storing their containing MethodInfos or ClassInfos. While this might be hard to avoid in some cases, it also causes overhead in case containers change or the referencing objects might be shared between containers (e.g. the same instruction used within different methods). Identifying the use of such back links could facilitate the use of Flyweight and other patterns to reduce this overhead.

Both topics can also be combined into a general dependency analysis of JPF classes that includes visualization.

(C) State Diff Viewer - JPF can be configured to save program state details in a readable text format, which is a suitable basis for comparing the differences between two states. This project would add a specialized diff viewer to the JPF shell that allows selection of the compared states, and presents differences broken down into static fields, heap (objects), stack frames and thread status.

(D) Interactive JPF tutorial and help system - prepare a "beginner's guide" to JPF. Select/write simple input programs for demonstrating different extensions of JPF. Reproducible documentation of each step of the process of applying JPF / extensions. Develop a help function as an extension of JPF shell.

Advanced Project Ideas

This is not an exclusive list! If you have variations, or other project ideas altogether, let us know on <jpf.gsoc [at] gmail.com> or the  JPF Google Group. The sooner, the better.

    1. JPF as Concurrency Teaching Assistant: extend the Impendulo system.
    2. Analysis of Biological models: use JPF to analyze biological models.
    3. Analysis of Networked Software: Add features to the  net-iocache extension.
    4. Modeling Network Communication
    5. Swing UI Model Checking: Add features to the scripting language in the jpf-awt extension.
    6. Automatic Delegation of Method Calls in JPF
    7. Automated Model Generation for Library Code: Design/extend static analysis for generation of library models/stubs.
    8. Quantitative Information Flow Analysis
    9. JPF Inspector: add features to the jpf-inspector extension.
    10. Dynamic State Matching
    11. Incremental State Hashing
    12. Trace Server: Store and post-mortem analyze program traces outside JPF.
    13. Abstract Model Checking: add predicate abstractions to the jpf-abstraction extension.
    14. Invariant Discovery: extend analyses in JPF to discover invariants.
    15. Computing Observable Modified Condition/Decision Coverage
    16. JDart Reporting
    17. Experimenting with JDart
    18. Combining JDart and Test Sequence Generation
    19. Experiments with Psyco
    20. Implementing a Habanero Java runtime in JPF
    21. Interfacing JPF with JIVE
    22. Verifying Probabilistic Programs: Develop a connector between JPF and the probabilistic verification tool PRISM.
    23. Guiding JPF by Reinforcement Learning: Implement a new search strategy for jpf-probabilistic.


    JPF as Concurrency Teaching Assistant

    Learning how to write concurrent programs can be a big challenge even for seasoned developers of sequential code. On the other hand model checking is particularly good at finding subtle concurrency errors. This project hopes to marry these two concepts by creating a development environment where a student learning concurrency is given instant feedback on their progress during the development of a concurrent program by running JPF on the code whenever they save. Two main goals will addressed here, firstly, we believe the instant feedback will help the learning process, but secondly, we will also gather a large number of versions of buggy and correct software. From this set of versions we hope to extract how people learn how to develop concurrent programs and this could be very valuable in teaching concurrency. Additionally these versions can be used in research, especially were program evolution is studied.

    The project will involve extending the Impendulo system (ASE 2010 Tool Paper entitled "Debugging the Programmer"), that captures project snapshots during program development in Eclipse, to add JPF to the tool chain that currently consist only of static analysis tools. This will allow one also to track how well JPF compares to static analysis tools in finding concurrency bugs. Note that JPF can also be used for sequential code being saved during an Impendulo session, but that would not be the primary goal of this project; although it will be a natural outcome of the project. Most of the technical challenge of this project will be closing the system during the development when not all the parts are present, how to represent errors to the developers and lastly how to analyse the different versions to extract programming bug patterns (from which one can deduce the common concurrency errors).


    Analysis of biological models

    This project will involve modelling of biological systems in Java or a Statechart dialect of Java and checking properties such as stability. We will start with existing models (e.g. models of C. elegans from MSR Cambridge) and will provide implementations for the particular composition operator and abstractions used in these models.


    Analysis of Networked Software

    Net-iocache is a JPF extension that allows JPF to communicate with other programs over the network, even if backtracking is involved. It has various features such as support for the java.net API, checkpointing tools to run programs in a virtualization environment, and more. At this point, though, UDP (user datagram protocol) is not supported yet. In a first step, basic UDP could be supported by making datagram delivery non-deterministic. In a second step, out-of-order delivery of datagrams could be simulated using choice generators.


    Modeling Network Communication

    One of the main features of JPF v7 will be the capability to model check distributed applications by using a centralized approach that runs each application in its own, type-separated thread, communicating through standard java.net APIs. The goal of this project is to develop a model for java.net.Socket based communication that can be used in conjunction with JPF's multi-process mode.


    Swing UI Model Checking

    The jpf-awt extension allows model checking of Swing applications, using a scripting language to specify user input sequences to explore, which looks like

    $MyTextField:input.setText("whatever")
    ANY { $Option1.doClick(), $Option2.doClick() }
    ...
    

    Apart from only supporting a small number of Swing components, jpf-awt lacks in terms of expressiveness of this scripting language (which happens to be the same syntax like the statemachine scripts). This project would add support for more components like JTree or JTable, and extend the scripting language so that especially choices become more convenient, like ANY_CHECKBOX_COMBINATION <group-name> or ANY_LIST_ITEM <list-name>. The scripting could also be extended to include properties like ?TEXT_EQUALS <textfield-name> <expected value>, so that users could completely script verification runs without also having to provide listeners checking such functional properties.


    Automatic Delegation of Method Calls in JPF

    jpf-nhandler is a JPF extension that provides automatic, on-demand delegation of calls in JPF. It allows for executing some parts of code directly on the host JVM. One of its main components is a translator that converts classes and objects between the JPF and JVM environments.

    The goal of this project is to improve the tool functionalities in various ways. One way is to implement an adaptor class for translating objects that the state of which is inconsistent between JPF and the host JVM, i.e. in most cases JPF model classes do not declare the same fields as declared in the actual class from the Java library. jpf-nhandler can be also improved by incorporating a lazy translation technique. Moreover, it can be extended to maintain the states of objects created on the host JVM side. Finally, jpf-nhandler can be extended to handle applications based on databases.


    Automated Model Generation for Library Code

    When model checking applications belonging to specific domains (e.g., Swing, Android), JPF users have to provide models for libraries that are too complex for JPF to run. The goal of this project is to provide some automated support for generation of library models/stubs for libraries, based on results of domain-specific static analysis. Once generated, such models can be packaged with jpf extensions that require them (e.g., jpf-awt, jpf-android). The project can be implemented as a continuation of the last year's project on model generation or on top of tools supporting static analysis for Java libraries.


    Quantitative Information Flow Analysis

    Use symbolic execution to compute information flow for security analysis. We will have a two step analysis: the first step is a qualitative analysis to decide which symbolic paths leak information up to a bounded depth; the second step is a quantitative analysis to give a guarantee about leakage for both explored and unexplored paths. Use the reliability analysis developed in SPF to deal with loops. This is a continuation from last year.


    JPF Inspector

    jpf-inspector is a tool for inspection and control of JPF execution that works like a debugger. This project aims to make the codebase up-to-date with the current version of JPF core (v7), do some refactoring, and implement new features. The list of features that would be really useful includes command-line interface (batch processing) and support for symbolic values (printing, breakpoints). Another goal would be to improve extensibility of the tool.


    Dynamic State Matching

    Design and implement user-friendly support (API) for dynamically changing what elements of the program state are considered for state matching. Use existing concepts of JPF API (listeners, annotations, dynamic serializer) when possible. Customize the state storage mechanism of JPF. It will be used to dynamically select elements of the concrete program state that will be ignored in state matching at the end of a given transition, or, more generally, to construct abstractions of the program state on-the-fly during the state space traversal. An additional goal is to implement support for dynamically defining custom elements of the program state (abstract values, a set of predicates with truth values) that will be also considered in state matching. It should be possible to turn on/off such custom elements of the program state dynamically.


    Incremental State Hashing

    JPF hashes the system under test program states in order to find out if it has already seen a state and hence can prune the current execution path. This is done by the AbstractSerializer class hierarchy and usually requires visiting each heap object in a canonical order to construct a byte stream that is the input for the hash function. In many cases transitions only change small numbers of objects and stack frames, resulting in a high overhead of a complete state serialization. In addition, the matching is fragile since any change in processing order of objects will cause the serialization to fail.

    In JPF v7, object reference values are unique - each allocation of the "same" object will receive the same reference value, no matter what the current execution path is. This project idea exploits these "Search Global Object Ids" (SGOIDs) of JPF v7 to implement incremental state hashing as described in "An Incremental Heap Canonicalization Algorithm"  by Madanlal Musuvathi and David L. Dill, Spin 2005.


    Trace Server

    Traces are memory hogs. If you have some production code SUT, it is quite normal that you end up with traces that contain millions of steps (Instruction objects). Not good to store millions of objects while you explore the state space and you don't even know yet if you are ever going to need the trace. The normal mitigation is to first run JPF without the "trace" topic in the reports, store the ChoiceGenerator path if you hit a defect or otherwise need a trace, and then replay this path with traces turned on if you need more information. This is a bit complicated. This is why the trace server was created in 2010. With such a database, you can use post mortem analyzers to find out about defects. Post mortem analyzers would not only speed up JPF in the first place, but also avoid having to re-run JPF on a large system under test if you need to try several trace analyzers. Work for 2013 includes extensions of the front end (maybe by using a domain-specific language to analyze traces), creating more analyzers, improvements in the database, and the performance of the entire framework.


    Abstract Model Checking

    Investigate possible approaches to predicate abstraction in the context of JPF. The solution will address various important features of Java - such as objects, classes, fields, arrays, local variables - and constraints over numeric types. Create a prototype implementation that will use a non-standard interpreter of bytecode instructions and attributes to store abstract values (predicates). It will also adapt the serializer for state matching. This project builds on Abstract PathFinder. The first step would be to make Abstract Pathfinder up-to-date with the current version of JPF (v7).


    Invariant Discovery

    Model checkers, such as JPF, require users to provide functional properties of interest to check, e.g., in the form of assertions in code. Manually writing such properties can be burdensome on the user but not providing them can reduce the effectiveness of model checking. This project explores a novel connection between a model checker and properties of code, such as program invariants. Specifically, a model checker is used to discover such properties. An automated technique based on JPF for Java assertions will be developed and evaluated in the course of the project.


    Computing Observable Modified Condition/Decision Coverage

    A significant issue when auto-generating tests is the effectiveness of the resulting test suite at finding faults. Structural coverage metrics, such as branch coverage, are used to guide automated test generation, but previous experiments have shown that such suites, when generated using model checking, often do not provide good fault finding. This project will implement and evaluate automated test generation using a new test metric, Observable Modified Condition/Decision Coverage (OMC/DC), which is an extension of the MC/DC metric that is used in safety-critical domains. OMC/DC combines the coverage of decisions required by MC/DC with additional path conditions that increase the likelihood that faults encountered when executing decisions will propagate to a monitored variable. An ICSE 2013 paper, "Observable Modified Condition/Decision Coverage" by Michael Whalen, Gregory Gay, Dongjiang You, Mats P. E. Heimdahl, and Matt Staats demonstrated that autogenerated OMC/DC test suites are very effective at fault finding for dataflow programs (e.g., Simulink models). In this project, the notion of OMC/DC will be applied to Java programs and will be computed in a JPF extension.


    JDart Reporting

    Improving user interface of the jpf-JDart tool developed at CMU through generating better and more informative reports.


    Experimenting with JDart

    Finding and implementing examples (from the literature, own ones) for the jpf-JDart tool developed at CMU. Running JDart on these examples to document / test JDarts features.


    Combining JDart and Test Sequence Generation

    Enhance the jpf-JDart tool developed at CMU by combining it with other automatic test case generation tools.


    Experiments with Psyco

    Using different learning algorithms and finding more examples for the jpf-Psyco tool developed at CMU; evaluating the performance of Psyco with different learning algorithms.


    Implementing a Habanero Java runtime in JPF

    Habanero Java (HJ) is a mid-level concurrency language developed at RICE University that is similar in spirit to the X10 language from IBM:

    https://wiki.rice.edu/confluence/display/HABANERO/Habanero+Multicore+Software+Research+Project

    HJ hopes to produce portable parallel programming abstractions for future multi-core technologies.
    Currently, HJ adds new keywords to the Java language for asynchronous task creation, mutual exclusion
    in asynchronous tasks, barriers, and phasers. A simple HJ example that creates asynchronous tasks is shown below:

    // Start of Task T1 (main program)
    sum1 = 0; sum2 = 0; // Assume that sum1 & sum2 are fields (not local vars)
    finish {
       // Compute sum1 (lower half) and sum2 (upper half) in parallel
       async for(int i=0; i<X.length/2 ; i++)sum1+=X[i]; //TaskT2
       async for(int i=X.length/2 ; i<X.length ; i++)sum2+=X[i]; //TaskT
    }

    //Task T1 waits for Tasks T2 and T3 to complete
    int sum = sum1 + sum2; // Continuation of Task T1

    The goal of this Google Summer of Code (GSoC project is to modify and extend JPF to be able to verify HJ
    programs for deadlock and user provided assertion violations. Prior work that extends JPF for the X10
    language uses the X10 runtime libraries directly in JPF but requires modifications to JPF, the X10 runtime
    libraries, X10 language compiler, and other static analysis methods
    (see http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=6200092&contentType=Conference+Publications).
    This approach, rather than use the HJ runtime libraries, is to simulate the HJ runtime directly in JPF either
    through a specialized verification version of the runtime or arbitrary state transformations on the JPF state.
    A successful GSoC project would be able to take simple compiled HJ programs that only use async, finish, isolated,
    and next keywords as input to JPF and successfully show  correctness or provide a counterexample that either deadlocks
    or violates a user provided assertion.


    Interfacing JPF with JIVE

    The goal of this project is to facilitate a more visual and higher level understanding of JPF's trace output. A promising approach would be to interface JPF with JIVE, a state-of-the-art visualization and debugging system for Java that has been developed over a number of years and available at: www.cse.buffalo.edu/jive. JIVE supports visualization the current state through object diagrams and the history of execution through sequence diagrams, and both diagrams can be viewed at different levels of granularity. For example, when JPF finds the path leading to a deadlock, JIVE can provide a visualization of the path as well as the deadlock state. JIVE allows queries over the execution history in order pinpoint specific conditions. New visualizations that are customized to JPF can also be explored, e.g., a tree view showing the different alternative scheduling choices that JPF makes before it locates a deadlock.


    Verifying Probabilistic Programs

    Develop a connector between JPF and the probabilistic verification tool PRISM, to allow formal, automated analysis of Java programs that exhibit probabilistic behavior. This might include programs that use randomization, or that rely on unpredictable/unreliable features such as network connectivity. The goal is to leverage various probabilistic algorithms related to correctness, reliability, and performance present in PRISM for the analysis of Java programs.


    Guiding JPF by Reinforcement Learning

    The notorious state space explosion problem is one of the major challenges in model checking in general and probabilistic model checking in particular. Numerous techniques have been developed to tackle this problem. Experimental studies have shown that combining different techniques is most effective. One recent proposal to combat the problem is to measure the amount of progress that has been made towards verifying the property of interest. The order in which the different alternatives of probabilistic choices are explored, also known as a search strategy, impacts the amount of progress being made. Determining this order can be viewed as a planning problem. In particular, techniques from reinforcement learning seem amendable to this problem. In this project, we plan to design a new search strategy based on the notion of progress measure using techniques of reinforcement learning.

    JPF provides search strategies that do not take the probabilities into account such as depth-first search and breadth-first search. An extension of JPF named jpf-probabilistic contains several search strategies that do take the probabilities into account, such as probability-first search and random search. We plan to implement our new search strategy within this framework. Furthermore, we also plan to compare the performance of our search strategy with the above mentioned search strategies.