AutoCompare_SES

AutoCompare_SES is a software which scores the enrichment of a gene set in a single transcriptome (for a subset of gene names, use AutoCompare ZE). Using an algorithmically-optimized method of robust statistical analysis it takes two inputs :

1- the transcriptome(s) (where the first column are gene names and the next columns are the expression values of these genes)

2- the gene set which enrichment in the transcriptome will be measured : as a Sample Enrichment Score (SES)

AutoCompare_SES_norm computes SES scores and normalized SES scores. Indeed, the length of the genes shared by the gene set and the biological function in the database influences the SES. SES normalized scores are a % of the maximal theoretical score with the number of genes shares. SES normalized scores ranges between 0-100 and can be compared together.

Citation :

When using AutoCompare_SES to process data for publication please cite:

Large scale microarray profiling reveals four stages of immune escape in Non-Hodgkin Lymphomas. Marie Tosolini, Christelle Algans, Frédéric Pont, Bernard Ycart & Jean-Jacques Fournié. OncoImmunology. 2016.

Example of input transcriptome table (left) and AutoCompare results table (right)

Calculation speed of the Pvalue on a dual Xeon server measured with 1447 transcriptomes

Manual :

AutoCompare SES is a GNU Linux/Unix multithread software. It can also be used on windows.

1- if not already installed, install Perl free programming language

2- install R programming language.

3- install snowfall

4- install GNU parallel (GNU Linux only)

5- unzip the software

6- Data must be normalized. For RNAseq log2 normalisation can be used. copy data tsv files in the “data” directory : the first column are gene names and the next columns are the expression values, with separator TAB.

7- copy your databases in the databases directory. Databases are text files, each file contains a list of gene and the title of the file is name of the biological function. The files should be placed in a directory and the name of the directory is the name of the database. No sub folders allowed.

8- Edit the file AutoCompare_SES.conf to suit your default parameters. According to our experience, it is better to set gene frequence correction to false.

For windows users, it is very important to set the path of Rscript.exe in AutoCompare_SES.conf. For example : r_path=C:\Program Files\R\R-3.2.3\bin\Rscript.exe

9- execute the software by the command : perl AutoCompare_SES_xxx.pl or double click on the .pl file

Adjust the software parameters if necessary.

The software has two levels of parallelism :

a- files/databases can be processed in parallel (GNU Linux only)

b- Pvalue calculation

To compute many small files (<50 samples) set a to the maximum of threads and b=1

To compute medium size files (<500 samples) set a=x and b=4, with x=max_threads/b

To compute very large files (>>1000 samples) set a=x and b=20, with x=max_threads/b

10- SES scores are in the “results” directory. SES normalized scores are in the "results_norm" directory.

Here we provide databases of ~30000 biological functions, pathways etc... We also provide random samples and random databases to establish score thresholds.

download