Final Portfolio

Image Reference:

https://www.aces.edu/wp-content/uploads/2022/05/Figure-illustrative.jpg

Biological Question

Biological Experimental Approach

Sequencing Approach

Graphical Workflow

Bioinformatic Pipeline

Acknowledgements

References

Analyzing Differentially Expressed Genes in Young Leaf, Old Leaf, and Meristem Tissues of Glycine Max by RNA-seq while using Glycine_max_Lee_v1 Reference Genome

Author: Colin Thieken, Department of Chemical and Biomolecular Engineering at North Carolina State University

Biological Question

What changes in soybean gene expression do we uncover when aligning reads to different reference genomes?

Biological Experimental Approach

RNA from old and young leaves samples, as well as meristem samples, were isolated to form sequencing libraries.

Sequencing Approach

Next generation sequencing was used to generate paired end reads. (i.e. Illumina sequencing)

Graphical Workflow

Figure 1: Graphical abstract of RNA sequence workflow of Glycine Max young and old leaf samples, as well as meristem samples, compared to the indexed Glycine_max_Lee_v1 reference genome.

Bioinformatic Pipeline

Set up a working directory and QC data
- We first set up our own directory for working within the HPC system at NCSU. FastQC was performed on the Raw Sequence Data for Glycine Max by writing code in Linux. This was done to check the quality of the sequence reads before continuing.
Build an indexed reference genome to align your sequences
- A reference genome was indexed to align the Raw Sequence Data to it. Indexing was done by using the STAR indices software which was utilized through code. The reference genome was Glycine_max_Lee_v1.
Align your sequences to the genome
- A code was made to align the raw sequence data for Glycine max to the indexed reference genome. The STAR software was utilized for the alignment. This would ultimately output a BAM file for quantification.
Quantify the aligned sequences into counts to be analyzed downstream
- The Salmon software was used through making a code to quantify the alignments. The quantified files would show the gene expression levels of the data. These could then be viewed as text files or converted to excel sheets for viewing.
Off the HPC you will explore your data outputs using a free graphical user interface, GALAXY
- The quantification files were uploaded to GALAXY using DESeq2. The normalization of the data would adjust and account for factors that prevent direct comparison. The data output could then be analyzed for the differentially expressed genes.

Acknowledgements

I want to thank the CPT learning community, my teammates Monica Judd and Carlos Cofre, the instructors Dr. Carly Sjorgen, Dr. Emily Delorean, Dr. Emily Cartwright, and Edmaritz Hernandez Pagan. I also want to acknowledge the NC State HPC and bioinformatic resources that made this research possible.

References

https://sites.google.com/ncsu.edu/bitcpt-spring23/home?authuser=0

https://sites.google.com/ncsu.edu/bitcpt-spring23/working-directory?authuser=0

Images

https://global.discourse-cdn.com/business7/uploads/galaxy/original/2X/1/109079403925f55d010eea44bd9699efadbd1198.png

https://combine-lab.github.io/salmon/images/SalmonLogo.png

https://www.lexogen.com/rna-lexicon-indexing-strategies-and-solutions/

Figure developer:

https://www.biorender.com

Feel free to email with any questions: cdthieke@ncsu.edu

Page updated

Report abuse