SANGSUM Overview

Statistical

Analysis of

Next

Generation

Sequencing

Using

Matlab

This software was created to process the data from 16S pyrosequencing in a convenient manner. It mainly relies on MOTHUR (pat schloss' group) and the bioinformatic toolbox (to generate phylogenetic trees with Matlab).

Due to size limit, the actual software (and the necessary databases) is available on mediafire :

SANGSUM is a script developed with Matlab that can be used either directly under Matlab or as a standalone application. This script will allow you to process a large fasta file with thousands of sequences to perform the following tasks :

-Uchime chimera check

-phylotype assignation with the RDP classifier

-OTU clustering

-Automatic blast of the representative sequences of each OTU

-Export annotated table of raw and relative abundance of the

-Multivariate analysis (in particular a principal component analysis indicating the bionmial confidence interval for each sample)

-UPGMA phylogenetic tree of the representative sequences of each OTU

-Selection of phylotype using an exact probe and/or the RDP classification

Details are described in the manual : http://www.mediafire.com/?s6cnk554bssmhls

For the Matlab-linked version you need a folder with

SANSGUM2.zip: http://www.mediafire.com/?wh1lmwt3pzjv4uz (extract the files, start matlab, set the path of the folder and type "SANGSUM" in the command window)

Mothur(withSilvaSeqs) : http://www.mediafire.com/?83p2z0bp8o48d0z (extract the files MOTHUR20.2 and the silva reference sequences)

For the standalone version you need a folder with

MCRInstaller : http://www.mediafire.com/?jl5elcaxx0vh2lo (to run once)

SANGSUM2EXE.zip: http://www.mediafire.com/?b41plpk8f16ak1m (extract the files)

Mothur(withSilvaSeqs) : http://www.mediafire.com/?83p2z0bp8o48d0z (extract the files MOTHUR20.2 and the silva reference sequences)

Below is a quick overview of the outputs :

-the main window

- the annotated xls output, with the abundance of each phylotype in every sample, the name and the sequence of its representative member. DOW01 to DOW12 are the names of the samples.

-the tree with the relative abundances of each phylotype next to it. DOW01 to DOW12 are the names of the samples. This option requires the Bioinformatics toolbox.