Feasibility Study

RQ1: SPL Characteristics

RQ1: What characteristics of a DNA repository are consistent with that of a software product line?

We use as our subject repository the Registry of Standard Biological Parts. Our primary data source was an SQL dump of all the parts in the Registry’s API section taken on February 8th 2019*. Our artifacts include the database as well as each query used in our publication for data collection.

We also include the list of 200 randomly selected composite parts to evaluate what assets are present in parts. We include a spreadsheet of each part and the evaluation for each asset.


*note that the website labels this dump was taken on 6 October 2017, however we verified the latest part in the database was from December of 2018

BioBrick Database

The SQL dump is on the Registry of Standard Parts' API page, or the exact file used in this work can be downloaded here (in future years the online API dump may be updated to include new parts). The complete SQL script can be downloaded here.

Load the SQL dump into your favorite SQL interface. We provide instructions for mysql (experiments run on Version 8.0.16 for osx10.14 on x86_64):

    • go into interactive MySQL session with mysql -u root

    • create a database to import the dump CREATE DATABASE biobrickdb;

    • exit interactive session exit

    • load the database with mysql -u root biobrickdb < biobrick_database.sql (may take a minute)


The database is called biobrickdb and the table is called parts. To run all the SQL commands and store in a file called output.tab run mysql -u root < SQL_script.sql > output.tab. Or you can run each command individually described next (the interactive mode provides nicer visuals).


To view all the part types (Table 1) the following SQL command can be run:

SELECT DISTINCT part_type, COUNT(∗) FROM biobrickdb.parts

GROUP BY part_type ORDER BY COUNT(∗) DESC;

To view the parts by their uses (Table 2):

SELECT

CASE WHEN uses < 0 THEN '<0'

WHEN uses = 0 THEN ' 0'

WHEN uses > 0 AND uses <= 10 THEN ' 1 − 10'

WHEN uses > 10 AND uses <= 50 THEN '11 − 50'

WHEN uses > 50 AND uses <=100 THEN '51 − 100'

ELSE '101+'

END AS '# of Uses ',

COUNT(∗) AS '# of Parts' FROM biobrickdb.parts GROUP BY 1;

To obtain the data for Figure 6 (cumulative number of parts by year):

SELECT DISTINCT YEAR(creation_date), COUNT(*) FROM biobrickdb.parts

GROUP BY YEAR(creation_date) ORDER BY YEAR(creation_date) ASC;

To recreate the data for Figure 7 (number of parts by type added each year):

SELECT YEAR(creation_date),part_type,COUNT(*) FROM biobrickdb.parts GROUP BY YEAR(creation_date),part_type ORDER BY YEAR(creation_date);

To recreate our measurement of variability in Section 6.1.3:

SELECT

sum(case when part_type = 'regulatory' then 1 else 0 end) AS Regulatory,

sum(case when part_type = 'RBS' then 1 else 0 end) AS RBS,

sum(case when part_type = 'Coding' then 1 else 0 end) AS Coding,

sum(case when part_type = 'Terminator' then 1 else 0 end) AS Terminator

FROM biobrickdb.parts;

These SQL commands provide with the data needed to generate our graphs. We also provide our R script and input data (here) that can be used to generate the graphs in Figures 6 and 7.

200 Random Composite Parts - Asset Evaluation

We took a sample of 200 random composite parts (Table 3) using the following query (note this will give you a random set of 200 parts each time you run it, not necessarily the 200 used in our publication).

SELECT part name FROM biobrickdb.parts

WHERE part type='Composite' ORDER BY RAND() LIMIT 200;

We manually looked up each 200 parts (e.g. http://parts.igem.org/Part:BBa_K611032) in the registry and recorded if the several assets were present. The list of assets and criteria of determination for each was as follows:

  • SBOL model: The “Subparts” section contains symbols that have a direct translation into symbols found in the SBOL 2.0 set of glyphs.

  • Ruler model: The “Ruler” section contains at least one subpart name.

  • DNA sequence: The “Get part sequence” section returns a non-empty result.

  • Single/Double Strand (aka SS/DS): The “SS” or “DS” section contains a DNA sequence.

  • Textual description: The page provides some textual information about the part. The information must be more than could be determined by the name of the part alone.

  • Experimental results: The page provides any experimental results regarding the use of the part or a direct link to a results page.

All recording can be found in this spreadsheet (also shown below, a 1 means the asset was present and a 0 means the asset was not present).

For example, the evaluation for part BBa_K611032 can be seen below. This part has a textual description explaining what the part is and what it could be used for, the SBOL format, and the raw DNA Sequence (found by clicking on the text Get part sequence). This part lacks the experimental results asset.

200RandomParts.xlsx

Part BBa_K2757004 (seen below) has the textual description asset, the raw DNA parts sequence, and the experimental results asset (usually as part of a "characterization"). However this part lacks the SBOL model asset (subparts is grayed out and no model is present).