Genome Skim Pipeline

University of California, San Diego, Department of Bioengineering, Group 21

Abstract

Many problems arise in the biological sciences where an unknown genomic sample must be identified. Traditional sequencing and analysis methods are inefficient at classifying these types of samples and leave much room for improvement. With low-coverage shotgun sequencing to create a “genome skim,” and subsequent analysis of the data using our efficient tool, organisms can be classified in a fraction of the time. Our tool uses distance calculations between reference genomes and query inputs to create a phylogenetic output and query analysis statistics. Combining existing algorithmic approaches, and uploading our pipeline to a third-party, web-based application that can be accessed by anyone, we have made the process of analyzing genome skim data accessible to people with any level of computational skill. Our completed pipeline can now be used by any researcher through the Galaxy web server, and can read raw genome skim data and publicly available references into an accurate phylogenetic depiction of their relationship.

Bioengineering Day 2022

Problem Statement

For samples where genome fragments are acquired in bulk from an unclassified sample, there are currently no established workflows that perform phylogeny placement on this type of data and present the results in an easily comprehensible form for all researchers.

Subproject 1

Preprocessing

Subproject 2

Workflow: CONSULT, RESPECT, Skmer, APPLES

Subproject 3

Phylogenetic placement tree visualization

Subproject 4

Asynchronous email output

Check out our video!

Webpage Lead: Carleen

Page updated

Report abuse