Galaxy DESeq2 Analysis

Image Reference:

Galaxy

An account to use Galaxy was made at this link:

RNA-Seq was selected, followed by DESeq2 program where the sample files were uploaded and the program run.

Galaxy and DESeq2 Visualization Results

Figure 1: PCA Plot. Statistical visualization of Euclidean distance.

The visual representation of the Euclidean distance in the PCA plot shows the correlation between the Young Leaf, Old Leaf, and Meristem samples that were used. The closer the point is to another, the higher the correlation between them. Meanwhile, the farther the data points are from one another, the larger the difference between the points. From figure 1, the Young Leaf files are more similar to each other than the other files, the Meristem files would then be the second most similar to each file point versus the other files, and the Old Leaf files appear to be the least correlated to one another.

Figure 2: Heat map. Sample-to-sample distances that represent similarity between the samples of each tissue type.

The sample-to-sample distances show the similarity between the samples of each tissue. The higher the similarity, the darker the shade of blue. The Young Leaf files are the most correlated between each other, the Meristem files next, and then Old Leaf files showing the least correlation. It is apparent that the Young Leaf samples and Meristem samples show a fair amount of correlation between each other. The Young Leaf samples and Old Leaf samples appear the next closely correlated, followed by the correlation between the Meristem samples and Old Leaf samples.

Figure 3: Dispersion Plot. It shows the dataset's mean of normalized counts compared to variance.

The dispersion plot shows a red trend line surrounded by a set of blue data points on both sides of the red line and black data points surrounding both sides of the line as well. The red line represents the dataset's mean of normalized counts compared against variance. The trendline would start high with a high number of mean and low variance. The points should be around the trend line. The data points of mean normalized counts and dispersion above show a trend that as the mean of normalized count increases then dispersion decreases.

Figure 4: Histogram showing p-values for each gene being compared.

The histogram analysis shows p-values for each compared gene. It is visible that there is higher significance in over 20,000 genes that are likely under the 0.05 threshold. The p-value histogram is unadjusted and with adjustment with a method such as the Bonferroni method these p-values would be more reasonable (less than 20,000 showing what is changing the most between the groups). Data transformation can be performed to make the data less skewed and processing the p-values advised.

Figure 5: Microarray Plot. Shows fold change, which is how much bigger or smaller the transcripts changed in comparison between Old Leaf samples versus Meristem samples. The log fold change represents the upregulation or downregulation of certain genes.

When moving across the x-axis in the microarray plot above, the number of transcripts changes and smaller fold changes are seen. Increasing counts are seen when moving further along the axis where differences would be more significant. The plot was made with the Young Leaf files maintained as a control, which is why it is comparing the Old Leaf and Meristem samples.

Page updated

Report abuse