AWinK's workflow
Evaluation of hifiasm assemblies for AWinK selected reads at different coverage depths from a set of PBSim simulated HiFi data with sequencing depth of 1000× for C. elegans
With the rapid decrease in the cost of sequencing DNA, on small genomes it is not uncommon to have excessive sequencing data, sometimes exceeding 1000× sequencing depth (which we call ultra-deep). Because ultra-deep sequencing data significantly degrades the quality of the final assembly (for reasons not entirely clear to us), one faces the problem of how to select a subsample of the data for optimal assembly which is largely unexplored.
In this work, we first show that this problem is related to the minimum tiling path (MTP) problem which is known to be NP-hard. Then, we propose a heuristic (called AWinK) based on single-copy k-mer to select a subset of ultra-deep sequencing reads that maximizes the genomic coverage. Our experiments on both synthetic and real ultra-deep sequencing data demonstrate that AWinK can approximate the minimum tiling path in obtaining highly contiguous, accurate, and complete genome assembly. Compared to other six read selection strategies, subsets of reads chosen with AWinK produced assemblies that had the highest genome fraction and sequence identity.
GitHub: https://github.com/sakshar/AWinK
RAmbler's workflow
In this work, I have developed a reference-guided assembler specialized at resolving complex repeats using PacBio HiFi reads exclusively. The key idea lies in utilizing the single-copy k-mers (k-mers that are expected to occur only once in the genome). Across more than 250 synthetic data sets, RAmbler outperforms hifiasm, LJA, HiCANU, and Verkko across various parameters such as repeat lengths, number of repeats, heterozygosity rates, and depth of sequencing.
RAmbler vs. HG38: NucFreq read mapping coverage & CRAQ circos plots
In this project, we show that RAmbler can reconstruct human centromeres and other complex repeats to a quality comparable to the manually curated Telomere-to-Telomere (T2T) human genome assembly. Specifically, we have applied RAmbler to successfully reconstruct five repetitive regions from Chromosomes 8, 19, and X using PacBio HiFi reads exclusively and outperformed other assemblers achieving a T2T-level assembly quality.
GFViewer plot for some of the multigene families of Babesia divergens MO1 on Chromosomes 2 and 3
In this project, we present GFViewer, a webtool and a companion Python library for visualizing the genomic localization of genes in multigene families. GFViewer was initially designed to study multigene families in Babesia species, aiding research on evolution, speciation, virulence factors, and drug targets.
Website: http://gfviewer.cs.ucr.edu