With the advent of RNA-seq technologies it is feasible to motorize and quantify the expression of the full transcriptome of genomes. Such a huge volume of information is priceless for identifying unknown novel transcribed regions or unveiling new alternatively spliced forms of known genes. Here we will see in brief how to process RNA-seq reads that have been previously aligned to a genome in order to quantify the expression of all the genes in the genome. As the RPKM quantification is time-consuming, we will examine the output of these programs on a real scenario. Next, we will use the UCSC Genome browser to display RNA-seq data from the ENCODE project, integrating this information with ChIP-seq epigenetics profiles.
TABLE OF CONTENTS:
[ETC: 20 mins]
(1) Open the UCSC genome browser (human, hg19)
Open the ENCODE REGULATION track (Regulation block)
Read the Description about this information
We will work initially with the H3K4me3 subtrack
Click on the H3K4me3 super-track link to show each cell line
Change the Overlay method to deploy the set of ChIP-seq experiments
Analyze visually the position of H3K4me3 inside the genes in different cell lines
Go for tissue-specific genes (e.g. NANOG in ESCs, HNF1a in Liver and so on)
Repeat the analysis on H3K4me1 and H3K27ac
You have to study the different patterns of each histone mark on such genes and cell lines
(2) Now, open the ENCODE home page and explore these links (Human, Left menu)
[http://genome.ucsc.edu/ENCODE/index.html]
Check which Cell Types are available
Open now the Experiment Matrix
Open the ChIP-seq experiments submatrix
Search H3K36me3 for H1-hESC to include it into our session
Repeat with H3K27me3 in the same cell line
Set the Visibility on for these tracks in the search box
Explore this profile for NANOG, SOX2 and POU5F1 genes
Explore the profiles in developmental genes such as PAX6 or the HOXA cluster
[ETC: 20 mins]
(1) Open the UCSC genome browser (human, hg19)
Open the ENCODE REGULATION track (Regulation block)
Open the Transcription super-track links to show each cell line
Change the Overlay method to deploy the set of ChIP-seq and RNA-seq experiments
Analyze the correlation between H3K4me3/H3K36me3/H3K27me3 and the RNA-seq levels in these genes in different cell lines
Go for cell line-specific genes (e.g. NANOG in ESCs) and examine the profiles
(2) Open now the Experiment Matrix
Select (H1-hESC) RNAseq experiments
Set the Visibility on for two plus and minus tracks (sense and antisense)
Explore this profile for the genes we are studying in this session
Check on each case whether the strand-specific RNAseq works well
Integrate now the ENCODE NGS data on DNAseq in the same cell line
(3) Examine the ENCODE new web site at: [https://www.encodeproject.org/]
[ETC: 20 mins]
Open the SeqCode toolkit
Search the PCAplotter application
Download the expression data for the ESC-MES-CM differentiation
Perform the PCA plot to explore the relationship between the 3 RNAseq samples
Now, open the Scatterplotter application
Download the expression data for the ESC-MES differentiation
Activate the Show the diagonal line option and set both axis to 5
Perform the scatterplot of both conditions
In another tab, repeat the analysis on the expression data for the ESC-CM differentiation
Compare the correlation and the slopes of both comparisons
Use the FCanalysis tool to identify the genes up-regulated and down-regulated in ESC-CM (FC2)
Save both files and add them to the ESC-CM scatterplot to see their distribution
Open GALAXY and extract the first column (gene names) of both lists of up/down
Analyze both lists of genes with the Enrichr tool to characterize biological differences
Change in the FCanalysis application the FC to 10 and the Minimum value to 10
Repeat the scatterplot and the Enrichr analysis
Open the Boxplotter3 tool to study the expression of Up and Down gene lists
Bibliography
An integrated encyclopedia of DNA elements in the human genome. The ENCODE Project Consortium. Nature 489: 57-74 (2012). http://www.nature.com/encode/
The ENCODE (ENCyclopedia Of DNA Elements) Project. The ENCODE Project Consortium. Science 306:636-640 (2004).
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. The ENCODE Project Consortium. Nature 447: 799-816 (2007).
Goecks, J, Nekrutenko, A, Taylor, J and The Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.Genome Biol. 2010 Aug 25;11(8):R86.
Practical Bioinformatics. Michael Agostino. Garland Science (2012). ISBN: 978-0815344568.