SUMMARY
ChIP-seq bioinformatics analysis is fundamental to gain knowledge from massive sequencing experiments in search of protein binding sites. Here, we will see a full protocol to characterize the peaks of two independent ChIPseq experiments that in combination can be very useful to elucidate which genes in mouse ESCs belong to the bivalency class (H3K4me3+H3K27me3). Using the SeqCode toolkit we will first characterize the target genes associated to each set of ChIPseq peaks to study how such regions are distributed along the genome and in particular around the TSS of genes. Next, we will explore the overlap between both sets of target genes, defining a putative set of bivalent genes. Using Enrichr we will characterize the biological role of such genes in terms of multiple functional catalogs.
In the second part of this exercise, we will study the meta-gene profiles of the three classes of genes on several published ChIPseq experiments. Interestingly, we will notice how bivalent genes are different from active genes in terms of repressive and active marks of transcription. We will perform the metaplots of H3K4me3 and H3K27me3 to confirm the source of each gene set to test after other components of the transcription machinery such as RNA Polymerase II or PcG subunits. We will measure the strength of the ChIPseq signal on each case as well. Finally, we will perform two heatmaps of bivalent and active genes to understand the global differences shown above.
On the other hand, we will give a brief introduction to the huge volume of information generated in the ENCODE project framework using high-throughput sequencing techniques and other large-scale approaches. This data can be priceless before initiating a particular experiment in the wet lab. Here, we provide a few links to explore in detail the surprising amount of data on the UCSC ENCODE browser.
TABLE OF CONTENTS:
[ETC: 5 mins]
Open the following publication at PUBMED:
Read the Summary of the article
Use the GEO DataSets link in the Related information panel to find this ChIPseq data in the NCBI GEO web
Next, click on the ChIP_H3K4me3_WT link (GSM2645495) to open this experiment
Read carefully the information about the experimental and computational details
Explore this entry to find the processed data in the Supplementary files section
Download the peaks for H3K4me3 in mESCs
Repeat with the ChIP_H3K27me3_WT record to download the peaks of H3K27me3
Uncompress both files of peaks for the next steps
[ETC: 25 mins]
Open the SeqCode platform here:
Explore the main menu of options
Find the PeakAnnotator tool
Annotate the H3K4me3 peaks to get the target genes
In a separate window, repeat the annotation for H3K27me3
Explore the information shown in both cases
First focus on the pie charts and next explore the lists
Save the two lists of target genes for each histone mark
Open the Compare2Genes tool
Perform the overlap between H3K4me3 and H3K27me3 targets
Can you interpret in biological terms these results?
Save the three lists of genes: common, H3K4me3_only and H3K27me3_only
Open the Enrichr tool and perform the analysis of the bivalent set
Focus on the following sections: Transcription, Pathways, Ontologies
Check the Legacy annotations for our bivalent genes as well
Can you analyze with Enrichr the set of targets only reported for H3K27me3?
Perform the UpSet plot of the two lists of genes (H3K4me3 and H3K27me3 genes)
Interpret the results of this alternative representation
[ETC: 25 mins]
Open the SeqCode platform here:
Enter into the ProduceTSSplots tool
Generate the metaplot of H3K4me3 (mESCs) of the three lists of target genes
Repeat the same analysis with H3K27me3 (mESCs) in another window/tab
Interpret the results of each experiment
Repeat the plots using Ser5P, Ring1b and H3K27ac
Again, interpret the results in terms of the bivalent/active genes
Open now the ProduceGENEplots tool
Run the tool on H3K4me3, H3K27me3
Run now on H3K36me3 and Ser2P
Compare all these results and define classes of sharp and broad marks
Finally, open the ComputeChIPlevels tool
Calculate the ChIP signal strength of the three gene sets on H3K4me3 and H3K27me3
Change to other marks or proteins in the available set
Discuss about the results in biological terms
[ETC: 15 mins]
Open the SeqCode platform here:
Enter into the ProduceTSSmaps tool
Generate the heatmap of your list of bivalent genes
Use H3K4me3, H3K27me3 plus three marks more
Repeat the same heatmap with your list of active genes
[Moved to Session 7]
Open the UCSC genome browser (human, hg19)
Hide all the tracks
Show RefSeq genes
Zoom into the region chr20:22,777,964-23,706,257
Open the GENCODE track link (Genes and Gene Predictions block)
Read the Description about this track
Pack the GENCODE Genes version 17 subtrack
Compare visually the GENCODE and RefSeq annotated genes
2. Open the ENCODE REGULATION track (Regulation block)
Read the Description about this information
Show (full) the H3K4me3 subtrack
Click on the H3K4me3 super-track link to show each cell line
Change the Overlay method to deploy the set of ChIP-seq experiments
Analyze the position of H3K4me3 in these genes in different cell lines
Repeat the analysis on H3K4me1 and H3K27ac
[ETC: 15 mins]
(1) Now, open the ENCODE home page and explore these links (Human, Left menu)
[http://genome.ucsc.edu/ENCODE/index.html]
Check which Cell Types are available
Navigate through the Experiment List
Open now the Experiment Matrix
Open the ChIP-seq experiments submatrix
Click on (H1-hESC,H3K4me3) position, analyze the search box
Set the Visibility on for these two tracks
Explore this profile for NANOG, SOX2 and POU5F1 genes
Interpret in biological terms these results
Add now the ChIPseq of H3K27me3/H3K27ac/H3K36me3 in H1
Explore these profiles in the PAX6 and in the HoxA complex
Interpret in biological terms these results
Search information about Data standards
Check the information on Antibodies
Open the Education links
(2) Explore now the Mouse ENCODE data (tissues)
Examine the ENCODE new web site at: [https://www.encodeproject.org/]
Promoter bivalency favors an open chromatin architecture in embryonic stem cells. G. Mas, E. Blanco, C. Ballare, M. Sanso, Y. Spill, D. Hu, Y. Aoi, F. Le Dily, A. Shilatifard, M. A. Marti-Renom and L. Di Croce. Nature Genetics 50: 1452–1462.
An integrated encyclopedia of DNA elements in the human genome. The ENCODE Project Consortium. Nature 489: 57-74 (2012). http://www.nature.com/encode/
The ENCODE (ENCyclopedia Of DNA Elements) Project. The ENCODE Project Consortium. Science 306:636-640 (2004).
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. The ENCODE Project Consortium. Nature 447: 799-816 (2007).
Practical Bioinformatics. Michael Agostino. Garland Science (2012). ISBN: 978-0815344568.