Obtain & Prepare

16S Sequencing Data

story behind the data

In this tutorial we use the dataset generated by the Schloss lab

They describe the experiment as follows:

“The Schloss lab is interested in understanding the effect of normal variation in the gut microbiome on host health. To that end, we collected fresh feces from mice on a daily basis for 365 days post weaning. During the first 150 days post weaning (dpw), nothing was done to our mice except allow them to eat, get fat, and be merry. We were curious whether the rapid change in weight observed during the first 10 dpw affected the stability microbiome compared to the microbiome observed between days 140 and 150.”

To speed up analysis for this tutorial, we will use only a subset of this data. We will look at a single mouse at 10 different time points (5 early, 5 late).

Illumina sequencing data naming scheme

For this tutorial, you are given 19 pairs of files. For example, the following pair of files:

F3D0_S188_L001_R1_001.fastq

F3D0_S188_L001_R2_001.fastq

The first part of the file name indicates the sample; F3D0 here signifies that this sample was obtained from Female 3 on Day 0. The rest of the file name is identical, except for _R1 and _R2, this is used to indicate the forward and reverse reads respectively.

Activity: Obtaining the data

STEP 1: Create an empty analysis history

Make sure you have an empty analysis history created in your galaxy account. Give it a name: Micro101_16s_rRNA

STEP 2: Import Sample Data and Reference Data

Import the following Illumina sequencing files (in FASTQ format) and the reference data to your history.

Copy the URLs of the FASTQ files for uploa$ding to the Galaxy project:

https://zenodo.org/record/800651/files/F3D0_R1.fastqhttps://zenodo.org/record/800651/files/F3D0_R2.fastqhttps://zenodo.org/record/800651/files/F3D141_R1.fastqhttps://zenodo.org/record/800651/files/F3D141_R2.fastqhttps://zenodo.org/record/800651/files/F3D142_R1.fastqhttps://zenodo.org/record/800651/files/F3D142_R2.fastqhttps://zenodo.org/record/800651/files/F3D143_R1.fastqhttps://zenodo.org/record/800651/files/F3D143_R2.fastqhttps://zenodo.org/record/800651/files/F3D144_R1.fastqhttps://zenodo.org/record/800651/files/F3D144_R2.fastqhttps://zenodo.org/record/800651/files/F3D145_R1.fastqhttps://zenodo.org/record/800651/files/F3D145_R2.fastqhttps://zenodo.org/record/800651/files/F3D146_R1.fastqhttps://zenodo.org/record/800651/files/F3D146_R2.fastqhttps://zenodo.org/record/800651/files/F3D147_R1.fastqhttps://zenodo.org/record/800651/files/F3D147_R2.fastqhttps://zenodo.org/record/800651/files/F3D148_R1.fastqhttps://zenodo.org/record/800651/files/F3D148_R2.fastqhttps://zenodo.org/record/800651/files/F3D149_R1.fastqhttps://zenodo.org/record/800651/files/F3D149_R2.fastqhttps://zenodo.org/record/800651/files/F3D150_R1.fastqhttps://zenodo.org/record/800651/files/F3D150_R2.fastqhttps://zenodo.org/record/800651/files/F3D1_R1.fastqhttps://zenodo.org/record/800651/files/F3D1_R2.fastqhttps://zenodo.org/record/800651/files/F3D2_R1.fastqhttps://zenodo.org/record/800651/files/F3D2_R2.fastqhttps://zenodo.org/record/800651/files/F3D3_R1.fastqhttps://zenodo.org/record/800651/files/F3D3_R2.fastqhttps://zenodo.org/record/800651/files/F3D5_R1.fastqhttps://zenodo.org/record/800651/files/F3D5_R2.fastqhttps://zenodo.org/record/800651/files/F3D6_R1.fastqhttps://zenodo.org/record/800651/files/F3D6_R2.fastqhttps://zenodo.org/record/800651/files/F3D7_R1.fastqhttps://zenodo.org/record/800651/files/F3D7_R2.fastqhttps://zenodo.org/record/800651/files/F3D8_R1.fastqhttps://zenodo.org/record/800651/files/F3D8_R2.fastqhttps://zenodo.org/record/800651/files/F3D9_R1.fastqhttps://zenodo.org/record/800651/files/F3D9_R2.fastq

Copy the URLs of the reference data files for uploading to the Galaxy project:

https://zenodo.org/record/800651/files/silva.v4.fastahttps://zenodo.org/record/800651/files/trainset9_032012.pds.fastahttps://zenodo.org/record/800651/files/trainset9_032012.pds.taxhttps://zenodo.org/record/800651/files/mouse.dpw.metadata

STEP 3: Organize the data into a paired collection

Now that’s a lot of files to manage. Luckily Galaxy can make life a bit easier by allowing us to create dataset collections. This enables us to easily run tools on multiple datasets at once.

Since we have paired-end data, each sample consist of two separate fastq files, one containing the forward reads, and one containing the reverse reads. We can recognize the pairing from the file names, which will differ only by _R1or _R2 in the filename. We can tell Galaxy about this paired naming convention, so that our tools will know which files belong together. We do this by building a List of Dataset Pairs.

Previous Step - Create a Galaxy Account

Next Step - Data Quality Control

Page updated

Report abuse