RQ1-SPL Characteristics

RQ1: Does the BioBrick repository have the characteristics of a Software Product Line?

We use as our subject repository the Registry of Standard Biological Parts. Our primary data source was an SQL dump of all the parts in the Registry’s API section taken on February 8th 2019*. Our artifacts include the database as well as each query used in our publication.

We also include the list of 100 randomly selected composite parts to evaluate what assets are present in parts. We include a spreadsheet of each part and the evaluation for each asset.


*note that the website labels this dump was taken on 6 October 2017, however we verified the latest part in the database was from December of 2018


BioBrick Database

The SQL dump is on the Registry of Standard Parts' API page, or the exact file used in this work can be downloaded here (in future years the online API dump may be updated to include new parts).

Load the SQL dump into your favorite SQL interface. We provide instructions for mysql (experiments run on Version 8.0.16 for osx10.14 on x86_64):

    • go into interactive MySQL session with mysql -u root

    • create a database to import the dump CREATE DATABASE biobrickdb;

    • exit interactive session exit

    • load the database with mysql -u root biobrickdb < biobrick_database.sql (may take a minute)


The database is called biobrickdb and the table is called parts. To run all the SQL commands and store in a file called output.tab run mysql -u root < SQL_script.sql > output.tab. Or you can run each command individually described next (the interactive mode provides nicer visuals).


To view all the part types (Table 1) the following SQL command can be run:

SELECT DISTINCT part_type, COUNT(∗) FROM biobrickdb.parts

GROUP BY part_type ORDER BY COUNT(∗) DESC;

To view the parts by their uses (Table 2):

SELECT

CASE WHEN uses < 0 THEN '<0'

WHEN uses = 0 THEN ' 0'

WHEN uses > 0 AND uses <= 10 THEN ' 1 − 10'

WHEN uses > 10 AND uses <= 50 THEN '11 − 50'

WHEN uses > 50 AND uses <=100 THEN '51 − 100'

ELSE '101+'

END AS '# of Uses ',

COUNT(∗) AS '# of Parts' FROM biobrickdb.parts GROUP BY 1;


100 Random Composite Parts - Asset Evaluation

We took a sample of 100 random composite parts (Table 3) using the following query (note this will give you a random set of 100 parts each time you run it, not necessarily the 100 used in our publication).

SELECT part name FROM biobrickdb.parts

WHERE part type='Composite' ORDER BY RAND() LIMIT 100;

We manually looked up each 100 parts (e.g. http://parts.igem.org/Part:BBa_K611032) in the registry and recorded if they following assets were present:

  • SBOL model: Are the subparts (in SBOL) on the part’s page?

  • DNA sequence: Is the raw DNA sequence listed on the part’s page?

  • Textual Description: Is there a basic textual description of what the part does or is used for?

  • Experimental Results: Are there any experimental results associated with the part on the part’s page?

All recording can be found in this spreadsheet (also shown below, a 1 means the asset was present and a 0 means the asset was not present).

RQ1-Parts.xlsx

For example, the evaluation for part BBa_K611032 can be seen below. This part has a textual description explaining what the part is and what it could be used for, the SBOL format, and the raw DNA Sequence (found by clicking on the text Get part sequence). This part lacks the experimental results asset.

Part BBa_K2757004 (seen below) has the textual description asset, the raw DNA parts sequence, and the experimental results asset (usually as part of a "characterization"). However this part lacks the SBOL model asset (subparts is grayed out and no model is present).