Fall 2022 BIT CPT Course Instructors:
Course designer and Instructor: Dr. Carly Sjogren, B.A. Biology, Ph.D. Genetics, Genomics & Bioinformatics
Assistant Instructor: Dr. Emily Delorean, B.S. Crop Science, M.S. Plant Pathology, Ph.D. Genetics
Graduate Teaching Assistant: Sahana Prakash, B.S. Biological Sciences, in progress M.S. Physiology
The Fall 2022 Semester of BIT CPT is analyzing brand new RNA-seq data sets. Teams of undergraduate and graduate researchers will collaboratively analyze RNA-seq data sets from the model plant species Arabidopsis thaliana and crop species Solanacea lycopersicum. Student researchers will compare tomato transcriptomes sequenced at different depths to several reference genomes ranging from cultivar specific, new HIFI assembled genomes, ancient cultivars and related species in the Solanacea genus.
Arabidopsis wildtype leaf and meristem samples.
Tomato M82 wildtype tomato leaf and meristem samples.
BIOLOGICAL QUESTION: What are the changes in gene expression that exist between differentiated young leaves and undifferentiated stem cells from shoot meristems in different species?
BIOLOGICAL QUESTION: What changes in tomato gene expression do we uncover when sequencing at different read depths?
BIOLOGICAL QUESTION: What changes in tomato gene expression do we uncover when sequencing at different read depth and align reads to different reference genomes?
You and/or your collaborators have completed the hard work of designing your experiment, collecting your biological materials, isolating RNA, generating sequencing libraries and getting your samples sequenced. Now you finally have your FASTQ/FASTA sequences to analyze. What do you do with these giant sequence files to get to the interesting stuff you want to know?!
We will use the descriptions provided here to guide you through the analysis of RNA sequence data via a bioinformatic pipeline using Henry2, NC State's High Powered Computing resources:
Set up your working directory and QC your data
Build an indexed reference genome to align your sequences
Align your sequences to the genome
Quantify the aligned sequences into counts to be analyzed downstream.
Off the HPC you will explore your data outputs using a free graphical user interface, GALAXY
STAR is used to assemble an Arabidopsis genome and align sequence reads to it.
SALMON is used to quantify and normalize the aligned reads.
GALAXY is used to explore data outputs including differential gene expression.
To recapitulate the work done during this course, the pages listed below follow our pipeline. Note: we record ALL the work that we did, but recapitulation of this work can and should bypass the erroneous steps.
Quantification & Normalization
Data Exploration