Spring 2023 BIT CPT Course Instructors:
Course designer and Instructor: Dr. Carly Sjogren, B.A. Biology, Ph.D. Genetics, Genomics & Bioinformatics
Assistant Instructor: Dr. Emily Delorean, B.S. Crop Science, M.S. Plant Pathology, Ph.D. Genetics
Graduate Teaching Assistant: Edmaritz Hernandez Pagan
The Spring 2023 Semester of BIT CPT is analyzing brand new RNA-seq data sets. Teams of undergraduate and graduate researchers will collaboratively analyze RNA-seq data sets from the model plant species Arabidopsis thaliana and crop species Glycine max (soybean). Student researchers will compare soybean transcriptomes by aligning to several reference genomes ranging from cultivar specific, new HIFI assembled genomes, ancient cultivars and related species in the Glycine genus.
Arabidopsis wildtype leaf and meristem samples.
Glycine max cultivar Lee wildtype leaf and meristem samples.
BIOLOGICAL QUESTION: What are the changes in gene expression that exist between differentiated young leaves and undifferentiated stem cells from shoot meristems in different species?
BIOLOGICAL QUESTION: What changes in soybean gene expression do we uncover when our experimental design includes high numbers of biological replicates?
BIOLOGICAL QUESTION: What changes in soybean gene expression do we uncover when aligning reads to different reference genomes?
You and/or your collaborators have completed the hard work of designing your experiment, collecting your biological materials, isolating RNA, generating sequencing libraries and getting your samples sequenced. Now you finally have your sequences to analyze. What do you do with these giant sequence files to get to the interesting stuff you want to know?!
We will use the descriptions provided here to guide you through the analysis of RNA sequence data via a bioinformatic pipeline using Henry2, NC State's High Powered Computing resources:
Set up your working directory and QC your data
Build an indexed reference genome to align your sequences
Align your sequences to the genome
Quantify the aligned sequences into counts to be analyzed downstream.
Off the HPC you will explore your data outputs using a free graphical user interface, GALAXY
STAR is used to assemble an Arabidopsis genome and align sequence reads to it.
SALMON is used to quantify and normalize the aligned reads.
GALAXY is used to explore data outputs including differential gene expression.
To recapitulate the work done during this course, the pages listed below follow our pipeline. Note: we record ALL the work that we did, but recapitulation of this work can and should bypass the erroneous steps.
Noah
Jacob
John
Colin
Monica
Carlos
Haley
James
Jay
Andrew