BARCODEY - Background

BACKGROUND

Directed Evolution

Directed evolution is a widely utilized method for engineering proteins and nucleic acids with enhanced or novel functions. By generating a diverse mutant library and applying iterative rounds of selection, researchers can isolate variants with improved biochemical properties such as activity, specificity, or stability. This approach has had transformative impacts across biotechnology, including enzyme design, therapeutics, and synthetic biology — culminating in the 2018 Nobel Prize in Chemistry.

The Sequencing Bottleneck

Following one or more rounds of selection, accurate identification of enriched variants is critical for downstream analysis. This step typically involves sequencing individual clones to determine the distribution and identity of beneficial mutations. However, traditional sequencing methods introduce significant limitations.

Constraints of Sanger Sequencing

Sanger sequencing is widely used due to its reliability and accuracy. However, it presents key limitations when applied to high-throughput or full-length variant analysis:

Limited read length (~800–1000 base pairs), which may not capture entire constructs
Low scalability, as each variant must be sequenced individually
Manual processing, including read alignment and mutation mapping
Increased time and cost with growing library size
Not ideal for pooled or mixed-sample sequencing

These factors make Sanger sequencing inefficient for modern directed evolution workflows, especially when analyzing large mutant libraries.

Advantages of Long-Read Sequencing

Long-read platforms such as Oxford Nanopore address many of the limitations inherent to Sanger sequencing by enabling high-throughput, full-length analysis:

Extended read lengths, capable of covering entire plasmids or gene constructs
High-throughput potential, suitable for complex or pooled libraries
Single-molecule resolution, allowing detection of rare variants
Fewer preprocessing steps, with reduced need for tiling or assembly
Plasmidsaurus Premium PCR offers amplification-free sequencing and delivers raw .fastq files directly for flexible downstream use

This approach is ideally suited for directed evolution experiments that require full-length, high-resolution variant identification.

Remaining Challenges in Data Analysis

Remaining Challenges in Data Analysis Despite improvements in sequencing technology, data analysis remains a barrier — particularly for research groups without dedicated computational expertise. Commercial platforms like Geneious offer graphical interfaces for demultiplexing and consensus generation, but:

Incur substantial licensing costs (e.g., ~$200/year/student)
Require manual processing of individual samples
Are not optimized for high-throughput variant analysis

Our Solution

We built a lightweight, open-source tool that:

Accepts .fastq files and user-supplied barcodes
Filters reads by length
Demultiplexes and assembles full plasmid sequences
Exports final consensus sequences in FASTA format

The goal: make long-read analysis fast, accessible, and scalable — without relying on paid software.

Page updated

Report abuse