To explore the bunch of available tools to characterize gene regulatory regions on a real scenario, we will study one case in which a colleague of us suggested to examine the promoter region of the Leptin gene. Our collaborators suspect that the (core) promoter of Leptin contains binding sites for the Sp1, C/EBP and the TATA-box protein. First, we will use this sequence to find more information about Leptin. Next, we will explore a catalog of regulatory predictive models (Jaspar) to find the weight matrices characterizing such TFs. We aim to use such numerical representations of these binding sites to find putative hits on the Leptin promoter, confirming later this information using phylogenetic footprinting information. Another alternative is to run motif finding methods on the orthologs of Leptin in mouse and rat to highlight those overrepresented words on a set of genes presumably sharing the same regulatory mechanisms. Finally, we will evaluate each set of predictions using the experimental annotations available in the ABS database for this gene in human and mouse.
TABLE OF CONTENTS:
[ETC: 20 mins]
(1) Open the UCSC Genome Browser:
Download the Leptin human gene promoter [LINK]
Using BLAT (hg19), map the promoter to find the location in the genome:
Click over one RefSeq Leptin exon
Go to the Entrez Gene entry
Check the Gene ontology information
Explore information about this gene in OMIM
Turn the Conservation track on (phastCons)
Explore the conservation around the TSS
(2) Still working on the UCSC Genome Browser to learn on Leptin:
Extract 500 bp from the mouse promoter using UCSC
Open now the CLUSTAL Omega server:
[http://www.ebi.ac.uk/Tools/msa/clustalo/]
Copy and paste both human and mouse promoters
Select DNA for the format of sequences
Analyze the global alignment focusing on the TSS regions
Open the NCBI BLAST web at http://blast.ncbi.nlm.nih.gov/Blast.cgi
Select the Nucleotide BLAST box
Switch on the option Align two or more sequences
Select the Somewhat similar sequences (blastn) mode
Run the comparison between both promoters
Analyze the resulting local alignment
[ETC: 15 mins]
(1) Open the JASPAR database:
Press the JASPAR CORE Vertebrata button
Search the predictive model of TBP
Click over the Sequence Logo image
Analyze the information of the TBP record
Repeat this search with CEBPB
Repeat this search with SP1
Show the binding sites of SP1 in your screen
(2) Open now the CLUSTAL Omega server:
[http://www.ebi.ac.uk/Tools/msa/clustalo/]
Copy and paste this set of TBP binding sites [LINK]
Choose DNA, CLUSTALW format
Press the Submit button
Examine the resulting alignment in search of a core sequence
(3) Open the WebLogo server:
[http://weblogo.berkeley.edu/logo.cgi]
Paste the CLUSTAL Omega alignment into the corresponding box
Activate DNA/RNA in the Sequence type box
Submit the query (Create logo)
Examine the sequence logo
Generate the frequency logo as well
[ETC: 15 mins]
(1) Open the JASPAR 2016 database
[http://jaspar2016.genereg.net/]
Enter into the Vertebrates collection
Switch on the option box for TBP/SP1/CEBPB matrices
Copy and paste the Leptin promoters (human and mouse)
Run to elaborate a map of regulatory sites
Test several cutoff values
Repeat the procedure with the latest release of Jaspar (http://jaspar.genereg.net/, Scan box on the right, Add to cart the models)
(2) Alternative: Open the MATCH server to analyze promoter regions with TRANSFAC matrices:
[http://www.gene-regulation.com/cgi-bin/pub/programs/match/bin/match.cgi]
Copy and paste both Leptin promoters
Select Group of matrices: vertebrates
Select to minimize the sum of both error rates
Switch off use high quality matrices only
Submit the form to obtain the map of predictions
Repeat this procedure using use high quality matrices only
(3) Alternative: Open the RSA tools
Select Pattern matching -> matrix-scan (quick)
Copy and paste both promoters
Copy and paste the TRANSFAC matrices for TATA/SP1/CEBP (select Transfac)
MATRIX FORMAT:
AC TATA XX
P0 A C G T
01 61 145 152 31 S
02 16 46 18 309 T
03 352 0 2 35 A (...)
XX //
AC SP1 (...)
Press the Go button
Press the Feature map button (next screen: GO button)
Analyze the positions of the predicted sites on each case
(4) Alternative: Open the CBS website
[http://compfly.bio.ub.es/CBS/index_matscan.php]
Copy and paste both promoters
Copy and paste the TRANSFAC matrices for TATA/SP1/CEBP:
MATRIX FORMAT:
TATA 01 61 145 152 31 S 02 16 46 18 309 T 03 352 0 2 35 A 04 3 10 2 374 T 05 354 0 5 30 A 06 268 0 0 121 A 07 360 3 20 6 A 08 222 2 44 121 W 09 155 44 157 33 R 10 56 135 150 48 N 11 83 147 128 31 N 12 82 127 128 52 N 13 82 118 128 61 N 14 68 107 139 75 N 15 77 101 140 71 N //
SP1 01 2 1 6 2 G 02 3 1 6 1 G 03 0 0 11 0 G 04 0 0 11 0 G 05 0 8 2 1 C 06 3 0 6 2 G 07 0 1 7 3 G 08 1 0 8 2 G 09 1 2 7 1 G 10 3 2 0 6 T //
CEBP 01 5 3 4 10 N 02 7 6 7 2 N 03 5 0 3 14 T 04 3 0 8 11 K 05 2 0 2 18 T 06 0 0 22 0 G 07 2 5 12 3 G 08 8 1 2 11 W 09 10 2 4 6 N 10 16 0 0 6 A 11 6 5 7 4 N 12 4 4 7 7 N 13 4 7 4 7 N
Press the Submit button
Analyze the results using different threshold ranges
[ETC: 15 mins]
Open the MEME suite:
Copy and paste this Leptin orthologous promoters (500 bp) from human, mouse and rat [LINK]
Select optimum width for motifs between 5 and 15 bp
Define Maximum number of motifs as 10
Introduce your e-mail address to receive the results
Explore the output of the program
Analyze the resulting motifs using the TOMTOM application
[ETC: 10 mins]
Open the ABS database
[http://genome.imim.es/datasets/abs2005/]
Access the ABS, find the annotation of the Leptin gene
Explore the A0010 record to identify three annotated TFBSs
Find these annotations in the original publication [PUBMED]
Analyze the dotplot and the global/local alignments
Evaluate the predictions obtained in the Step 3 and Step 4
Using Galaxy, think how to map the human predictions on UCSC custom tracks
Once you've got the custom tracks, open the ENCODE regulation data to study this locus
Practical Bioinformatics. Michael Agostino. Garland Science (2012). ISBN: 978-0815344568.
Genomes, Browsers and Databases: Data-mining Tools for Integrated Genomic Databases. Peter Schattner. Cambridge University Press (2008). ISBN: 978-0521884433.
Understanding Bioinformatics. M.J. Zvelebil and J.O. Baum. Garland Publishing Inc ,USA (2007). ISBN-10: 0815340249.
E. Blanco. Computational characterization of regulatory regions. In Mourad Elloumi and Albert Y. Zomaya (editors): Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications. Wiley-Blackwell/John Wiley & Sons Ltd (2010). ISBN-13: 978-0470505199.
E. Blanco and R. Guigo. Predictive Methods Using DNA Sequences. In A. D. Baxevanis and B. F. Francis Ouellette, chief editors: Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Third Edition (pgs 115-142). John Wiley & Sons Inc., New York (2005). ISBN: 0-471-47878-4.