Recent improvements in the field of single-cell analysis have been key to develop scRNAseq approaches. Despite their current limitations, it is evident that such methods promise to revolutionize the way in which we study the expression of genes. Comparisons against bulk RNAseq results obtained before will be also important to understand the pros and cons of each alternative. On the other hand, microarrays revolutionize genome-wide analysis in the latest 1990s, allowing for expression measurements of distinct cellular conditions (wild-type, knock-out) along several time points. This technology is able to construct expression profiles for the genes stored on a genome and decipher which promoters are bound by a given protein (ChIPonchip). Here, we will learn how to explore and use microarray data deposited in the standard repositories (NCBI GEO).
TABLE OF CONTENTS:
[ETC: 15 mins]
Open the following entry in GEO:
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75790
Ziegenhain C, Vieth B, Parekh S, Reinius B et al. Comparative Analysis of Single-Cell RNA Sequencing Methods. Mol Cell 2017 65:631-643.
Explore the entry and find the raw data of each cell (using each technique)
Find the file that contains the expression of all genes in all cells
Open this web server to explore the expression of mouse genes in these scRNAseq methods (mESCs, 2i+LIF):
http://ldicrocelab.crg.eu/others/scRNAseq2i/
Run the demo example included into the application (one gene, several genes)
Explore the expression values of one gene in a single platform
Study the different expression patterns in each cell
Repeat the analysis exploring each boxplot
Change the genes and try with housekeeping genes (Rpl10) or differentiation genes (Pax6, Hand1,...)
[ETC: 30 mins]
(1) Open these article:
E. Blanco*, M. Pignatelli*, S.Beltran, A. Punset, S. Perez-Lluch, F. Serras, R. Guigo and M. Corominas.
Read carefully the Abstract
Read the beginning of the Whole-genome expression analysis of trx mutants section
Go to Materials and methods to identify the GEO accession for these microarrays
(2) Now, open the NCBI-GEO repository:
[http://www.ncbi.nlm.nih.gov/geo]
Use the GEO accession number GSE8783 to access to the microarrays
Read carefully this page: distinguish the Platform and the Samples links
Access the Platform GPL3797 information
Read about the design of this microarray in this page
Find out in how many other works this platform has been used
Go to the Data table and examine the Full table
Go back to the main trx regulatory gene network in D. melanogaster record
Open the GSM217239 trxE3/trxB11_larvae_(Replicate1) microarray
Read the description of each microarray channel
Go to the Data table and open the Full table
Find out which is the column that represents the final result of this comparison for each spot
What does it mean that a spot presents a null value in the previous column?
You can see an overview of the GPR format here
Open Galaxy to join the Platform with the Expression values in the first replicate
You can find the platform and the expression files here
Upload both files into Galaxy (Upload icon on top, left)
Use the Join two datasets function to combine the platform and the expression values
Use the Cut column option to show just the gene name and the expression value
Finally, filter the null values and sort the results alphabetically
(3) Now, let's work with the up/down gene lists in the article:
From the article get the list of 260 over-expressed and 275 under-expressed genes in Trx mutant larvae
Open Microsoft Excel to save each sheet (upregulated and downregulated) in a distinct tab-separated text file
Open Galaxy to upload both files (Get data -> Upload File, Convert spaces to tabs: Yes)
Repeat the following step on both lists of genes:
Use Text Manipulation -> Cut columns to get only the 1st column (RefSeq NM)
Edit the name of this list (up or down)
Open the DAVID functional annotation tool: [http://david.abcc.ncifcrf.gov/]
Select Functional Annotation (link on the left menu)
Upload your Upregulated gene list (identifier: REFSEQ_MRNA, type: Gene list)
Select Drosophila melanogaster species
Press the Functional Annotation Chart button
Explore this ranking of GO enrichments
Press over humoral immune response to explore its GO record
Press over the Genes blue bar for humoral immune response to see the genes
Going back to the main results screen, press the Functional Annotation Clustering button
Going back to the main results screen, press the Functional Annotation Table button
Repeat this procedure with your Downregulated gene list
Find differences between GO enrichments in up/down gene sets
[ETC: 25 mins]
(1) Open this record in GEO:
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6710
Reischl J et al. Increased expression of Wnt5a in psoriatic plaques. J Invest Dermatol 2007 127(1):163-9.
Read carefully the record to identify the platform and the samples
Open the platform (GPL96) and examine the list of available experiments
Go to the Data table (please, do not Download the full table this time)
Analyze the content of the Data table (you can use the header line information above)
You can get more info on the CEL format here
Go back to the main GSE6710 record
Open the GSM154768-Patient B, lesional skin sample
Go to the Data table and open the Full table
Analyze the content of the Full table (you can use the header line information above)
(2) Go to the Affymetrix home page
Click on Data analysis
Now, open the NetAffx Analysis Center
Click on NetAffx Query - Search probe sets for a term or identifier
Please, take five minutes to register into the system
Login into the intranet using your account information
Select the same GeneChip described in the GEO Platform information (GPL96)
Let's search the 202360_at probe set id, which is the gene?
Expand the query result and browse through this record
Press the View on UCSC Browser to see the probes
Explore the location of the probes within the gene
Use UCSC BLAT with the first probe sequence (5'-3')
(3) Functional annotation with DAVID:
Open the publication and download Table S2 and Table S3
Use GALAXY to preprocess this information and extract the gene symbol names
Use DAVID to perform a GO term enrichment analysis
[ETC: 25 mins]
(1) Open BABELOMICS:
[http://babelomics.bioinfo.cipf.es]
Analyze the file formats for the Upload data link
Analyze the options for Processing data
Analyze the services for measuring Expression and Functional analysis
(2) Open the Open Source Clustering Software home page:
[http://bonsai.hgc.jp/~mdehoon/software/cluster]
Investigate about the available clustering and the tree view programs
(3) Open the MIAME format web site:
[http://www.ncbi.nlm.nih.gov/geo/info/MIAME.html]
Explore the characteristics of this standard format for microarrays
Comparative Analysis of Single-Cell RNA Sequencing Methods. Ziegenhain C, Vieth B, Parekh S, Reinius B et al. Mol Cell 65:631-643 (2017).
Microarray technology in practice. Steve Russell, Lisa A. Meadows and Roslin R. Russell. Peter Schattner. Elsevier (2008). ISBN: 9780123725165.
Practical Bioinformatics. Michael Agostino. Garland Science (2012). ISBN: 978-0815344568.
Understanding Bioinformatics. M.J. Zvelebil and J.O. Baum. Garland Publishing Inc ,USA (2007). ISBN-10: 0815340249
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd Edition. A.D. Baxevanis and B. F. Francis Ouellette, chief editors. John Wiley & Sons Inc., New York (2005). ISBN: 0-4 71-47878-4.