SESSION 4

SUMMARY

ChIP-seq bioinformatics analysis is fundamental to gain knowledge from massive sequencing experiments in search of protein binding sites. Here, we will see a full protocol to characterize the peaks of two independent ChIPseq experiments that in combination can be very useful to elucidate which genes in mouse ESCs belong to the bivalency class (H3K4me3+H3K27me3). Using the SeqCode toolkit we will first characterize the target genes associated to each set of ChIPseq peaks to study how such regions are distributed along the genome and in particular around the TSS of genes. Next, we will explore the overlap between both sets of target genes, defining a putative set of bivalent genes. Using Enrichr we will characterize the biological role of such genes in terms of multiple functional catalogs.

In the second part of this exercise, we will study the meta-gene profiles of the three classes of genes on several published ChIPseq experiments. Interestingly, we will notice how bivalent genes are different from active genes in terms of repressive and active marks of transcription. We will perform the metaplots of H3K4me3 and H3K27me3 to confirm the source of each gene set to test after other components of the transcription machinery such as RNA Polymerase II or PcG subunits. We will measure the strength of the ChIPseq signal on each case as well. Finally, we will perform two heatmaps of bivalent and active genes to understand the global differences shown above.

On the other hand, we will give a brief introduction to the huge volume of information generated in the ENCODE project framework using high-throughput sequencing techniques and other large-scale approaches. This data can be priceless before initiating a particular experiment in the wet lab. Here, we provide a few links to explore in detail the surprising amount of data on the UCSC ENCODE browser.

TABLE OF CONTENTS:

Step 1. Obtaining the ChIPseq peaks for H3K4me3 and H3K27me3 in GEO

[ETC: 5 mins]

Open the following publication at PUBMED:

[Promoter bivalency favors an open chromatin architecture in embryonic stem cells. Nature Genetics 50:1452-1462 (2018).]

- Read the Summary of the article
- Use the GEO DataSets link in the Related information panel to find this ChIPseq data in the NCBI GEO web
- Next, click on the ChIP_H3K4me3_WT link (GSM2645495) to open this experiment
- Read carefully the information about the experimental and computational details

- Explore this entry to find the processed data in the Supplementary files section
- Download the peaks for H3K4me3 in mESCs
- Repeat with the ChIP_H3K27me3_WT record to download the peaks of H3K27me3
- Uncompress both files of peaks for the next steps

Step 2. Open the SeqCode toolkit to characterize the peaks

[ETC: 25 mins]

Open the SeqCode platform here:

[http://ldicrocelab.crg.eu/]

- Explore the main menu of options
- Find the PeakAnnotator tool
- Annotate the H3K4me3 peaks to get the target genes
- In a separate window, repeat the annotation for H3K27me3

- Explore the information shown in both cases
- First focus on the pie charts and next explore the lists
- Save the two lists of target genes for each histone mark

- Open the Compare2Genes tool
- Perform the overlap between H3K4me3 and H3K27me3 targets
- Can you interpret in biological terms these results?
- Save the three lists of genes: common, H3K4me3_only and H3K27me3_only

- Open the Enrichr tool and perform the analysis of the bivalent set
- Focus on the following sections: Transcription, Pathways, Ontologies
- Check the Legacy annotations for our bivalent genes as well
- Can you analyze with Enrichr the set of targets only reported for H3K27me3?

- Perform the UpSet plot of the two lists of genes (H3K4me3 and H3K27me3 genes)
- Interpret the results of this alternative representation

Step 3. Open the SeqCode toolkit to generate the meta-gene plots

[ETC: 25 mins]

Open the SeqCode platform here:

[http://ldicrocelab.crg.eu/]

- Enter into the ProduceTSSplots tool
- Generate the metaplot of H3K4me3 (mESCs) of the three lists of target genes
- Repeat the same analysis with H3K27me3 (mESCs) in another window/tab
- Interpret the results of each experiment

- Repeat the plots using Ser5P, Ring1b and H3K27ac
- Again, interpret the results in terms of the bivalent/active genes

- Open now the ProduceGENEplots tool
- Run the tool on H3K4me3, H3K27me3
- Run now on H3K36me3 and Ser2P
- Compare all these results and define classes of sharp and broad marks

- Finally, open the ComputeChIPlevels tool
- Calculate the ChIP signal strength of the three gene sets on H3K4me3 and H3K27me3
- Change to other marks or proteins in the available set
- Discuss about the results in biological terms

Step 4. Open the SeqCode toolkit to generate the heatmaps

[ETC: 15 mins]

Open the SeqCode platform here:

[http://ldicrocelab.crg.eu/]

- Enter into the ProduceTSSmaps tool
- Generate the heatmap of your list of bivalent genes
- Use H3K4me3, H3K27me3 plus three marks more

- Repeat the same heatmap with your list of active genes

STEP 5. UCSC and ENCODE tracks

[Moved to Session 7]

Open the UCSC genome browser (human, hg19)

Hide all the tracks
Show RefSeq genes
Zoom into the region chr20:22,777,964-23,706,257
Open the GENCODE track link (Genes and Gene Predictions block)
Read the Description about this track
Pack the GENCODE Genes version 17 subtrack
Compare visually the GENCODE and RefSeq annotated genes

2. Open the ENCODE REGULATION track (Regulation block)

Read the Description about this information
Show (full) the H3K4me3 subtrack
Click on the H3K4me3 super-track link to show each cell line
Change the Overlay method to deploy the set of ChIP-seq experiments
Analyze the position of H3K4me3 in these genes in different cell lines
Repeat the analysis on H3K4me1 and H3K27ac

STEP 6. Browsing the ENCODE website

[ETC: 15 mins]

(1) Now, open the ENCODE home page and explore these links (Human, Left menu)

[http://genome.ucsc.edu/ENCODE/index.html]

Check which Cell Types are available
Navigate through the Experiment List
Open now the Experiment Matrix
Open the ChIP-seq experiments submatrix

Click on (H1-hESC,H3K4me3) position, analyze the search box
Set the Visibility on for these two tracks
Explore this profile for NANOG, SOX2 and POU5F1 genes
Interpret in biological terms these results

Add now the ChIPseq of H3K27me3/H3K27ac/H3K36me3 in H1
Explore these profiles in the PAX6 and in the HoxA complex
Interpret in biological terms these results

Search information about Data standards
Check the information on Antibodies
Open the Education links

(2) Explore now the Mouse ENCODE data (tissues)

Examine the ENCODE new web site at: [https://www.encodeproject.org/]

Bibliography

- - - Promoter bivalency favors an open chromatin architecture in embryonic stem cells. G. Mas, E. Blanco, C. Ballare, M. Sanso, Y. Spill, D. Hu, Y. Aoi, F. Le Dily, A. Shilatifard, M. A. Marti-Renom and L. Di Croce. Nature Genetics 50: 1452–1462.
    - An integrated encyclopedia of DNA elements in the human genome. The ENCODE Project Consortium. Nature 489: 57-74 (2012). http://www.nature.com/encode/
    - The ENCODE (ENCyclopedia Of DNA Elements) Project. The ENCODE Project Consortium. Science 306:636-640 (2004).
    - Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. The ENCODE Project Consortium. Nature 447: 799-816 (2007).
    - Practical Bioinformatics. Michael Agostino. Garland Science (2012). ISBN: 978-0815344568.

Google Sites

Report abuse