In this tutorial we use the dataset generated by the Schloss lab
They describe the experiment as follows:
“The Schloss lab is interested in understanding the effect of normal variation in the gut microbiome on host health. To that end, we collected fresh feces from mice on a daily basis for 365 days post weaning. During the first 150 days post weaning (dpw), nothing was done to our mice except allow them to eat, get fat, and be merry. We were curious whether the rapid change in weight observed during the first 10 dpw affected the stability microbiome compared to the microbiome observed between days 140 and 150.”
To speed up analysis for this tutorial, we will use only a subset of this data. We will look at a single mouse at 10 different time points (5 early, 5 late).
For this tutorial, you are given 19 pairs of files. For example, the following pair of files:
F3D0_S188_L001_R1_001.fastq
F3D0_S188_L001_R2_001.fastq
The first part of the file name indicates the sample; F3D0 here signifies that this sample was obtained from Female 3 on Day 0. The rest of the file name is identical, except for _R1 and _R2, this is used to indicate the forward and reverse reads respectively.
STEP 1: Create an empty analysis history
Make sure you have an empty analysis history created in your galaxy account. Give it a name: Micro101_16s_rRNA
STEP 2: Import Sample Data and Reference Data
Import the following Illumina sequencing files (in FASTQ format) and the reference data to your history.
Copy the URLs of the FASTQ files for uploa$ding to the Galaxy project:
Copy the URLs of the reference data files for uploading to the Galaxy project:
STEP 3: Organize the data into a paired collection
Now that’s a lot of files to manage. Luckily Galaxy can make life a bit easier by allowing us to create dataset collections. This enables us to easily run tools on multiple datasets at once.
Since we have paired-end data, each sample consist of two separate fastq files, one containing the forward reads, and one containing the reverse reads. We can recognize the pairing from the file names, which will differ only by _R1or _R2 in the filename. We can tell Galaxy about this paired naming convention, so that our tools will know which files belong together. We do this by building a List of Dataset Pairs.