iGEAK uses several interactive control widgets (sliders, radio button, text/area input, select box ...etc.) to provide users convenient ways to adjust parameters quickly and intuitively. For example, an expression heatmap in "Heatmap" tab can be easily re-sized, re-colored, and re-clustered just moving sliders. You can use a mouse in most cases, but you may want to use mouse AND keyboard for fine adjustment.
Each page tab in iGEAK interface corresponds to each task (e.g. "GSEA"). A user can initiate and update each task just by clicking tab. You can easily switch between different tasks and update parameters by several mouse clicks.
Two species are currently supported: (1) Human and (2) Mouse. Well-formatted 3 input data described in "Input Data" page are required to run iGEAK.
For microarray data:
your data matrix should be already summarized / normalized (e.g. by RMA (Robust Multi-array Average) method). Please check the sample boxplot and confirm if your data matrix is properly normalized. The mean processed signals represent the normalized and background corrected mean signal intensities. The mean values should be properly aligned.
Open data folders and upload (#1) an annotation file, (#2) a sample-group definition file, then (#3) a log2-transformed mRNA expression (microarray) or a raw count (RNA-seq) matrix.
You can choose probesets with gene symbols [#4] and you can also remove [#5] "sub-optimal" Affymetrix probesets from the downstream analyses. In most cases Affymetrix (U133 and similar platforms) probesets which are not "_at" endings are sub-optimal probesets.
You may check the quality of each probeset at : GeneAnnot server: https://genecards.weizmann.ac.il/geneannot/index.shtml
Once all 3 input files are uploaded, [#6] choose groups of samples (at least two) you want to analyze, move them from the left panel to the right panel, then click the "Submit" botton. Finally, click "Submit" button [#7]. This action subsets the original gene expression matrix to the iGEAK engine.
For RNA-seq data:
You upload a raw count matrix. iGEAK normalized these counts on the fly using edgeR's TMM (Trimmed Mean of M-values) normalization method (See this edgeR paper). You can choose one of two differentially expressed gene (DEG) prediction method between edgeR and voom-limma. You can filter out lowly expressed genes by changing "minumum CPM values" and/or "mimimum sample size" [#6]. The normalized gene expression matrix is displayed below the raw count matrix.
You can briefly check if there are outliers in your sample group. This tab provides principal component analysis (PCA) and sample-correlation plot.
If you decide to remove some samples, edit your sample-group definition file (metadata) and reload the updated file.
Probably you are only interested in a subset of genes in your list. Please copy and paste or type your genes (symbol, case-sensitive) of interest in the text area, then find the updated gene expression matrix, heatmap, and boxplots of them.
You can choose parametric (ANOVA & post-hoc pairwise Tukey's test) or non-parametric (Kruskall-Wallis & post-hoc Mann-Whitney U-test) variance test based on the (1) Shapiro-Wilk Normality test and (2) group dispersion test.
If the Shapiro-Wilk test p-value > 0.05, you may choose the parametric tests (ANOVA & Tukey-test), since your data do not seem to violate the normality assumption. However, the parametric tests can perform well with continuous data that are slightly non-normal if each group's sample size is > 15 and you have 2-9 groups in total.
You may choose the non-parametric tests (Kruskall-Wallis & post-hoc pairwise Mann-Whitney U-test) if your data violate the normality assumption and/or you have a very small sample size, but the data for all groups have the same dispersion. If your groups have a different dispersion, the non-parametric tests might not provide valid results.
iGEAK provides two independent pathway analyses based on Reactome Database (http://www.reactome.org): “ReactomePA” (Reactome-Based Pathway Analysis) tab launches a tweaked version of ReactomePA (Yu and He, 2016) analysis.
A "ORA" tab provides a simple Over-Representation Analysis (ORA) function based on REACTOME.
A “GSEA” tab provides a simplified GSEA algorithm (http://software.broadinstitute.org/gsea) implemented in ReactomePA package. The Reactome database is used for a reference gene set database. The whole analysis could be slow.
If you prefer to use a Broad-GSEA program (http://software.broadinstitute.org/gsea), download three GSEA input files from the "Broad-GSEA" tabs.
Human gene symbols are all upper-case, but mouse symbols use all lower-cases except the first character.
You may use the following Excel function for quick conversion.
But this approach only works when human and mouse symbols are same. In many cases, they are different.
I recommend using Human Gene Nomenclature Committee (HUGO)'s HCOP web-service to find correct orthologs genes between human and mouse.
But, the easiest way to convert human/mouse gene symbols is using iGEAK's symbolConversion tool. Currently iGEAK uses Ensembl-v92 to retrieve human/mouse gene orthologs.
Kwangmin Choi @ Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA