The complete pipeline and source code are available here.
Python 3 – Core scripting language used for all processing and automation
Biopython – For parsing .fastq files, reverse complementing sequences, and writing outputs
Pandas – For structured data handling (e.g., sequence tracking and metrics)
Seaborn & Matplotlib – For visualization of read length distributions and barcode statistics
NumPy – For numerical operations within plotting and filtering workflows
Flye – Long-read de novo assembler used to generate consensus sequences for each barcode group
Subprocess & OS Modules – For executing Flye assembly and managing file system operations
Inkscape and Biorender - For figure creation
Dr. Edward Marcotte (course instructor) and Zoya Ansari (course TA) - for guidance and support throughout the project
Dr. Brian Hew and Dr. Jesse Owens (University of Hawaii at Manoa) - for providing raw premium PCR sequence.
Ira Zibbu - for assistance in troubleshooting and configuring the Flye assembler
The open-source community – for developing tools like Biopython, Flye, Inkscape etc.