SESSION 7

SUMMARY

With the advent of RNA-seq technologies it is feasible to motorize and quantify the expression of the full transcriptome of genomes. Such a huge volume of information is priceless for identifying unknown novel transcribed regions or unveiling new alternatively spliced forms of known genes. Here we will see in brief how to process RNA-seq reads that have been previously aligned to a genome in order to quantify the expression of all the genes in the genome. As the RPKM quantification is time-consuming, we will examine the output of these programs on a real scenario. Next, we will use the UCSC Genome browser to display RNA-seq data from the ENCODE project, integrating this information with ChIP-seq epigenetics profiles.

TABLE OF CONTENTS:

Step 1. Working with ENCODE ChIPseq data

[ETC: 20 mins]

(1) Open the UCSC genome browser (human, hg19)

- Open the ENCODE REGULATION track (Regulation block)
- Read the Description about this information
- We will work initially with the H3K4me3 subtrack

- Click on the H3K4me3 super-track link to show each cell line
- Change the Overlay method to deploy the set of ChIP-seq experiments

- Analyze visually the position of H3K4me3 inside the genes in different cell lines
- Go for tissue-specific genes (e.g. NANOG in ESCs, HNF1a in Liver and so on)
- Repeat the analysis on H3K4me1 and H3K27ac
- You have to study the different patterns of each histone mark on such genes and cell lines

(2) Now, open the ENCODE home page and explore these links (Human, Left menu)

[http://genome.ucsc.edu/ENCODE/index.html]

- - Check which Cell Types are available
  - Open now the Experiment Matrix

- - Open the ChIP-seq experiments submatrix
  - Search H3K36me3 for H1-hESC to include it into our session
  - Repeat with H3K27me3 in the same cell line
  - Set the Visibility on for these tracks in the search box

- - Explore this profile for NANOG, SOX2 and POU5F1 genes

- - Explore the profiles in developmental genes such as PAX6 or the HOXA cluster

Step 2. Working with ENCODE RNAseq data

[ETC: 20 mins]

(1) Open the UCSC genome browser (human, hg19)

- Open the ENCODE REGULATION track (Regulation block)
- Open the Transcription super-track links to show each cell line
- Change the Overlay method to deploy the set of ChIP-seq and RNA-seq experiments

- Analyze the correlation between H3K4me3/H3K36me3/H3K27me3 and the RNA-seq levels in these genes in different cell lines
- Go for cell line-specific genes (e.g. NANOG in ESCs) and examine the profiles

(2) Open now the Experiment Matrix

- Select (H1-hESC) RNAseq experiments
- Set the Visibility on for two plus and minus tracks (sense and antisense)
- Explore this profile for the genes we are studying in this session

- Check on each case whether the strand-specific RNAseq works well

- Integrate now the ENCODE NGS data on DNAseq in the same cell line

(3) Examine the ENCODE new web site at: [https://www.encodeproject.org/]

Step 3. Working with lists of RPKMs

[ETC: 20 mins]

Open the SeqCode toolkit

- Search the PCAplotter application
- Download the expression data for the ESC-MES-CM differentiation
- Perform the PCA plot to explore the relationship between the 3 RNAseq samples

- Now, open the Scatterplotter application
- Download the expression data for the ESC-MES differentiation
- Activate the Show the diagonal line option and set both axis to 5
- Perform the scatterplot of both conditions

- In another tab, repeat the analysis on the expression data for the ESC-CM differentiation
- Compare the correlation and the slopes of both comparisons

- Use the FCanalysis tool to identify the genes up-regulated and down-regulated in ESC-CM (FC2)
- Save both files and add them to the ESC-CM scatterplot to see their distribution
- Open GALAXY and extract the first column (gene names) of both lists of up/down
- Analyze both lists of genes with the Enrichr tool to characterize biological differences

- Change in the FCanalysis application the FC to 10 and the Minimum value to 10
- Repeat the scatterplot and the Enrichr analysis

- Open the Boxplotter3 tool to study the expression of Up and Down gene lists

Bibliography

An integrated encyclopedia of DNA elements in the human genome. The ENCODE Project Consortium. Nature 489: 57-74 (2012). http://www.nature.com/encode/
The ENCODE (ENCyclopedia Of DNA Elements) Project. The ENCODE Project Consortium. Science 306:636-640 (2004).
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. The ENCODE Project Consortium. Nature 447: 799-816 (2007).
Goecks, J, Nekrutenko, A, Taylor, J and The Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.Genome Biol. 2010 Aug 25;11(8):R86.
Practical Bioinformatics. Michael Agostino. Garland Science (2012). ISBN: 978-0815344568.

Google Sites

Report abuse