Input Fortmat

Input Files

iGEAK uses simple tab-delimited text files (extension name: .csv) as input & output files

iGEAK uses 3 input files. You need to prepare for
You may ask any bioinformatician(s) to make input files, but it is very easy to prepare for them.

Common mistake

Please check hidden "space" character is not included in the columns or headers (esp. after gene ID/name, sample name, group name). This is the most common mistake when you prepare input files using spreadsheet programs (e.g. Excel).

Annotation File

An annotation file is a simple two-column table.
The first column is for "unique" probe IDs (microarray) or gene IDs (or symbol) (RNA-seq) and the second column is always matched gene symbol
Users can add alternative group definitions in the multiple columns and decide one of group definitions during the first step ("Data Upload")
Gene symbols are not stable and often changes. If you want to use a new symbols mapped to unique ID in the first column, you may use DAVID's conversion tool [link].
Example (microarray)

Microarray (Affymetrix probeset ID)

RNA-seq (Ensembl Gene ID)

Group-Definition File (metadata)

Group definition file is a simple multi-column table.
The first column is for sample IDs and all other columns are for sample groups. Users can add alternative group definitions in the muiltiple columns and decide one of group definitions during the first step ("Data Upload")

Microarray

RNA-seq

Gene Expression Matrix

iGEAK-microarray uses a tab-delimited normalized (e.g. by RMA) expression matrix (txt/csv format) as an input file. If you want to prepare for input files from raw CEL files, you may try ArrayAnalysis.org (http://arrayanalysis.org).
iGEAK-RNAseq uses a raw count matrix generated from sequencing read counting programs such as featureCounts or HTSeq-count. If you want to prepare for input files from scratch (FASTQ or BAM), you may try Galaxy platform (https://usegalaxy.org)
The first column is unique identifiers such as probeset ID. These IDs should be the same as unique IDs in your annotation file.
The first row is a header including sample IDs. These IDs should be the same as unique IDs in your metadata file

Microarray: log2-normalized gene expression matrix

RNA-seq: raw gene count matrix

Output Files

GEATPbox exports a data table as a tab-delimited text csv file. Since Microsoft Excel doesn't open CSV files correctly by default. If you use MS-Excel program, please use the following simple solution to open CSV files.

Open a new Excel window
Choose the Data tab
Choose the From Text option
Choose your *.csv file
Choose the Delimited radio box, then click Next
Choose Tab

Please check the following links for detailed information (2007, 2010, 2013, 2016)

https://support.office.com/en-us/article/Import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-339e391393ba

Kwangmin Choi @ Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA