Diversity Analysis & PCoA

with Mian

In this tutorial, we will perform Diversity Analysis and Principal Coordinates Analysis using Mian. First, go to miandata.org and create a free account. Mian uses Vegan, an ecological diversity tool suite, to perform diversity analysis.

Prepare metadata and BIOM file for the diversity analysis

Metadata contains the information about each sample. The dataset from the Schloss lab that we have been using within Galaxy comes with the metadata. Let's download it from the History panel in Galaxy. Type meta in the search datasets text box on the top portion of the History panel. You should see the file mouse.dpw.metadata which has two columns group and dpw. Click the Download icon to download the file.

The file downloaded to your computer will have filename like this:

GalaxyXX-[https___zenodo.org_record_800651_files_mouse.dpw.metadata].tabular

Rename the filename to mouse.dpw.metadata.tsv (Your computer might ask if you want to keep the .tabular file extension or use .tsv. Choose Use .tsv)
Open the file with Microsoft excel, Numbers for Mac, or Google spreadsheet or any spreadsheet program.
Spreadsheet program automatically will recognize file extension .tsv (tab separated values). You should see that this file has two columns, group and dpw.

Recall: the Schloss lab describe the experiment as follows:

“The Schloss lab is interested in understanding the effect of normal variation in the gut microbiome on host health. To that end, we collected fresh feces from mice on a daily basis for 365 days post weaning. During the first 150 days post weaning (dpw), nothing was done to our mice except allow them to eat, get fat, and be merry. We were curious whether the rapid change in weight observed during the first 10 dpw affected the stability microbiome compared to the microbiome observed between days 140 and 150.”

To allow us to compare the stability of the microbiome between the early days vs. the later days post weaning, we will create an addition column with the heading stage. For rows with dpw values between 0 and 9, we will have early as the stage value. For the rest of the rows, we will enter late under stage.

When saving this file, choose Save As and select Tab delimited Text (.txt) for File Format

Create a BIOM file in Galaxy without subsampling (Mian will take care of the subsampling)

Use Make.biom tool in Galaxy

use the OTU shared file generated by Make.shared
For contaxonomy, use taxonomy output by Classify.out before subsampling in the OTU Clustering step. (You might see two Classify.out taxonomy files in the dropdown list (one before subsampling and one after), pick the one with the smaller job number).
use the metadata file that we uploaded to the Galaxy project in the Obtain and Prepare 16S Data step.

Download the biom file to your computer by clicking the Download button.
After download, rename the file extension from biom1 to biom.

Create a new project in Mian and upload the BIOM file and metadata file

Now that you have registered for an account in Mian. Login to your account at miandata.org
Click New Project to create a new project
You should see the following page where you will give your project a name and upload the biom file and sample metadata file. Enter "Micro101_mouse_microbiome" in the text field.
Click the Biom button

Upload the biom file that you have downloaded to your computer in the previous step.
Upload the Tab-delimited text file that has three columns: group, dpw, stage for the Sample Metadata
Click Create on the bottom of the page.

You should see the sample ID and the total number of OTU Count listed in the new page. We will use Auto Subsample for normalization. We will use all the samples for our analysis, so we will use Don't Filter for Sample Filtering. Click Create Project.

You will see your project being created like the following:

We will use Mian to do the following data analyses:

Diversity Analysis

Alpha diversity

1. 1. Rarefaction
  2. Boxplot of Diversity Index

Beta diversity

1. 1. Boxplot of Diversity Index

2. Principal Coordinate Analysis (PCoA)

Alpha diversity

In order to estimate alpha diversity of the samples, we first generate the rarefaction curves, which measures the number of observed OTUs as a function of the subsampling size. We will also visualize the alpha diversity using box-plots with different Diversity indices.

Rarefaction

To estimate the fraction of species sequenced, rarefaction curves are typically used. A rarefaction curve plots the number of species as a function of the number of individuals sampled. The curve usually begins with a steep slope, which at some point begins to flatten as fewer species are being discovered per sample: the gentler the slope, the less contribution of the sampling to the total number of operational taxonomic units or OTUs.

In the rarefaction plot on the right, green: most or all species have been sampled; blue: this habitat has not been exhaustively sampled; red: species rich habitat, only a small fraction has been sampled.

(A Primer on Metagenomics Wooley et al. 2010 )

Click on the Visualize dropdown menu, and select Rarefaction Curves
When the Rarefaction page loaded, select dpw under Visualization Parameters
This is an interactive plot, you can use your mouse to hover to any of the dots along the curves, and you will see the Sample ID, Sample Size, and Number of Species for that data point.
You can save the Snapshot to the project Notebook by clicking the Save Snapshot to Notebook button above the plot.
You can also download the plot by clicking Download.
You can also create a live URL to share this plot by clicking Share. The viewer doesn't need to have a Mian account to view the interactive plot.

You can also change the Color Variable to stage to see which group has more species sampled already.

Question: If you have the budget to perform additional sequencing of the samples, which ones will you select based on the rarefaction curve? Hover each line to determine which line represents which sample. (Hint: the steeper the slope, more species in the samples are yet to be sampled)

F3D144, F3D150, F3D145, F3D147, F3D9, and F3D0 have steeper slopes, so more species in these samples are yet to be sampled. In general, the late stage samples seem to need more sequencing to discover more or all the species.

Visualizing Alpha-diversity using Box-plots

We will use Simpson Diversity Index for our analysis. The the bigger the value of Simpson Diversity, the lower the diversity. Read more about Simpson Diversity Index here.

From the Diversity dropdown menu in Mian, select Alpha Diversity
Select OTU for Taxonomic Level
Select stage for Experimental Variable
Select Boxplot for Plot Type
Select Diversity Index for Diversity Context
Select Simpson for Diversity Index
Select Wilcoxon Rank-Sum (Non-Parametric) for Statistical Test

You should see an interactive box-plot like the following:

Question: Which stage of dpw (early or late) has a more diversity in the mouse gut microbiome?

The mouse gut microbiome may be more diversity in the late stage compared to early stage (The the bigger the value of Simpson Diversity, the lower the diversity). However, this result is not statistically significant since the P-Value is quite high.

Beta Diversity

Beta diversity is a measures the dissimilarity between different groups of samples.

Visualizing Beta-diversity using Box-plots

We will use Bray-Curtis Index for our analysis. The Bray–Curtis dissimilarity is bounded between 0 and 1, where 0 means the two sites have the same composition (that is they share all the species), and 1 means the two sites do not share any species.

From the Diversity dropdown menu in Mian, select Beta Diversity
Select OTU for Taxonomic Level
Select stage for Categorical Variable
Select dpw for Color Variable
Select None for Strata Variable
Select Bray-Curtis for Diversity Type
Select 999 for Number of Permutations

You should see an interactive box-plot like the following:

Question: Which group (early or late) shares more common species?

The samples from late stage appear to share more common species than those in the early stage. The p-value is 0.001, so this result is statistically significant.

Princpal CoordinateS Analysis

(PCoA)

Principal Coordinates Analysis (PCoA, = Multidimensional scaling, MDS) is a method to explore and to visualize similarities or dissimilarities of data. It starts with a similarity matrix or dissimilarity matrix (= distance matrix). In the data, there are many factors (multidimensional) that influence the differences between two samples. PCoA reduces the complexity by identifying the principal components that explain the variance in the data, and assigns each sample a location in a low-dimensional space, e.g. as a 2D plot.

Visualizing the similarities and dissimilarities using PCoA

From the Diversity dropdown menu in Mian, select PCoA
Select OTU for Taxonomic Level
Select stage for Categorical Variable
Select Euclidean (PCA) for PCoA Type
Select 2D for Type of Plot
Select Principal Component 1 for Axis 1
Select Principal Component 2 for Axis 2

You should see an interactive PCA bi-plot like the following:

Below the PCA bi-plot, there is a line plot that shows how much each principal component (PC) explains/captures the variance in the data. This plot will help you decide which principal component to use in the PCA bi-plot. In this example, PC1 explains about 54% of the differences between the samples. PC2 and PC3 explain about 15% and 12%, respectively.

Previous Step - Visualizations

Return to Home

Page updated

Report abuse