1) Register for a Galaxy account at https://usegalaxy.org/
2) Login
Note: Check if Galaxy is operational https://status.galaxyproject.org/
We will be using DESeq2 to perform differential gene expression analysis.
Use the search bar on the left side to find the software tool called 'DESeq2'.
It is also under Genomics Analysis > RNA-seq > DESeq2
Click on the DESeq2 result to open the tool.
Tool opens in the middle pane.
Factors correspond to sample comparisons we want to make. In this example, it is SAM (Factor 1) vs leaf (Factor 2) samples.
1) Fill in the factor names where the boxes are.
2) Select the blue highlighted folder button to upload the data sets to the appropriate factors.
After you click the folder button, a pop-up will appear. Click the 'Upload' button on the bottom left side.
Drag and drop your 5 Soybean SAM quant files into this pop-up. Remember that the quantification files were the output files from Salmon.
Select '.txt' from the drop down menu for file type (highlighted in magenta). Note that the box is searchable so you don't have to scroll through the entire list.
Click the 'Start' button highlighted in blue. Once the files have uploaded, the rows will turn green and the Status will be 100%.
Click the 'Select' button. OR click outside of the pop up and the file will be in a right handed panel that you can now drag and drop into the counts files box for the appropriate factor level.
Repeat for the two Leaf samples in factor level two.
Select the DESeq2 options match these:
1) TPM values (magenta line)
2) Salmon for Program used (forest green line)
3) GTF/GFF3 for Gene mapping format (lime green line)
4) Then upload the glyma.Wm82_ISU01.gnm2.ann1.FGFB.gene_models_main.AGAT.gtf for the Annotation file (blue line).
Note that this NEEDS to be the same gtf annotation file we used during Salmon quantification.
Hit the execute button to submit your job!
Your job will appear in the right hand column.
Your job will look gray while it is waiting in the queue.
Your job will turn yellow when it starts running.
It will turn green once it is done!
Click the job to see the outputs and to download the files.
The DESeq2 analysis output several summary visualizations for our soybean tissue comparisons. We will go through a few of them here.
This plot shows our samples after the count data has been decomposed down to the first two principle components. We clearly see one cluster for the meristem and two clusters for leaf data with the young leaf samples on the far top left and the old leaf samples mostly on the right. The meristem samples on the far bottom right. It appears that the first principle component has captured the majority of the variability (77%) involved in the difference between these two tissue types with the second principle component capturing the between sample variabilities.
Here we have a similarity heatmap between our samples. We clearly see the two groups emerge again with distinct similarity clusters for the meristem and leaf groups. There is more similarity between the young leaf and the SAM.
Here is a plot of the p values for our differential expression analysis with tissue type (SAM vs leaf) as our factor levels. We can see the majority of p-values are in the smallest range, which tells us that these are of significance. It is important to remember though that this plot is showing uncorrected p-values so not all of these small values will represent truly differentially expressed genes.
This MA plot is a visualization of the changes in expression between our factor levels (leaf vs SAM tissue). The dots that fall away from the horizontal line at log fold change = 0 represent genes with large changes in expression. This plot also does not represent p-value adjustments though and is more useful for eyeballing general expression patterns in the data
We also get a table from our DESeq2 analysis link below. This explicitly shows us which genes were differentially expressed in our analysis after adjusting for multiple testing. The value in the last column holds these adjusted p-values and the first column holds the identifier for the gene, as pulled from the annotation file for our genome.