Empirical Study

Effectiveness and Efficiency of Automated Reverse Engineering

RQ1: What is the effectiveness and efficiency of automated reverse engineering?

We reverse engineer four models: the ASU team project, kill switch, viral vector GOI, and viral vector capsid. For the kill switch and viral vector models we use 200 generations and 100 runs. Due to the large number of features in the ASU model we use 400 generations and 40 runs. We find the validity plateaued at these settings. To see this you can view any of the ASU30-r#_refactoring.csv files and see how the validity changes over the generations.

We perform two types of reverse engineering: (1) from a known oracle, and (2) from an incomplete set of products. The first method can apply when there is a domain expert who can generate a set of products representing the entire product line to use as a predefined oracle. The second method applies when a complete set of products is not possible to ascertain. In that case the user builds the model using any available resources. Note this is likely to create an incomplete feature model meaning not all products in the product line will be represented. This initial incomplete feature model serves as a starting point for engineers and can be refined over time and in collaboration with other engineers.

Tool - SPLRevO

We utilize an existing reverse engineering tool, SPLRevO developed by Thianniwet and Cohen (2015, 2016). We note that other similar tools could also be used. This tool accepts either (1) a set of constraints based on domain knowledge describing the compatibility of the DNA parts, or (2) a set of products which can be the known working composite components. The tool then uses a genetic algorithm to automatically build a feature model that represents all products. The fitness function (validity) aims to maximize the the coverage of the set of desired products while minimizing any undesired (additional) products using a penalty.

We provide all models under the models directory of the download package. We configured our genetic algorithm to use FFValidity fitness function and 1% mutation on a population of 100 models. We ran the model up to 200 generations on the computing cluster with AMD Opteron(TM) CPUs running at 2300MHz with a maximum Java memory pool of 32GB. Then, we captured the model with the best fitness value (maximization). The results are in the results directory. It shows that SPLRevO was able to provide us with a model that closely resembles the hand-built model and has 100% validity (it represents exactly the same number of products).

The existing tool, SPLRevO is limited in scalability when reverse-engineering from products (limits to 27 features) so we slightly condensed the ASU model in two ways. We (1) combined any double terminator (B0010 and B0012) into one (B0015). This is the case for the sender, receiver, and behavior (e.g. B0010_S and B0012_S become B0015_S). BioBrick part B0015 is the combination of parts B0010 and B0012. And (2) removed two features mandatory to all products (B0034_S_mC and mCherry). Note this still results in 30 products.

The tool and instructions on how to run on our models can be downloaded here.

You can either run the experiment yourself with the model in the /models/ directory, a sample of result files are in /models/completed_results/sample_results/ , and all raw output files are in /models/sample_results/all_results/. Please note for the ASU model, we ran SPLRevO for about 27 hours with 13 hours of CPU time to get the correct model. We translated this output into FeatureIDE format. The key for the SPLRevO format can be seen here (also in the experimental directory). The translated models can be seen in the feature models below and in the Eclipse FeatureIDE workspace provided in these artifacts.

Feature Models

ASU Reverse Engineered - Run 8 (Figure 19)

We show the feature models for all four reverse engineered runs we display in the paper. Click the images to view them fully and please keep in mind for best viewing please use the FeatureIDE Eclipse interface. We provide our entire workspace to view all models here.

Kill Switch Reverse Engineered - Run 51 (Figure 20)

Viral Vector GOI Reverse Engineered - Run 34 (Figure 21)

Viral Vector Capsid Reverse Engineered - Run 14 (Figure 22)