Cancer In Silico Drug Discovery (cidd) is a command-line based tool for analyzing TCGA data and other cancer data sets for tumor molecular profiling and candidate drug discovery.
When identifying candidate drugs using cidd, cancer data is being analyzed that has been produced and shared by other cancer research groups. As a result, please adhere to the TCGA publication guidelines in addition to citing work from the CMap, CCLE and drug databases (for annotations used) when using cidd results in your publications.
tcga_util and cidd run on Mac OSX and Linux operating systems. Installation involves 3 major steps: pre-requisite software installation, data resource installation and R library installation as described below.
Python 2.7 or greater
Python libraries: numpy, scipy, lxml
R 3.0 or greater
Please register at http://cidd.houstonbioinformatics.org and download and unzip the most recent versions of tcga_util_{version}.tgz and cidd_{version}.tgz. In the base directories of these extractions run the following command to install the software packages:
For tcga_util (you may need to add sudo to the beginning of the python setup.py install command):
tar -xvzf tcga_util_{version}.tgz
cd tcga_util_{version}
python setup.py install
For cidd (you may need to add sudo to the beginning of the python setup.py install command):
tar -xvzf cidd_{version}.tgz
cd cidd_{version}
python setup.py install
The below command will check to make sure that the pre-requisite data are available. Required data include data from the CMap, CCLE and MSigDB. Other data sources that can be directly downloaded from the web, and those that have been customized for cidd use, will be installed by cidd check if this data has not been downloaded previously. See below for these data details. If cidd check is successful, you will get a "resources verified" message. Otherwise, a message letting you know what is missing will be displayed.
cidd check
This command will set up the directory structure for a data store and download several data resources automatically from the web. Some data sets require registration at websites for download and these are specified here. Note that $DATA_STORE refers to the location of your data_store directory created by cidd check. Once you have manually installed the data resources, you can run cidd check again to see if everything is set up for use by cidd. Once you set up this data store, you can reuse it for additional projects and analyses.
Connectivity Map (http://www.broadinstitute.org/cmap)
requires registration: yes
install location: $DATA_STORE/cmap
data files:
instance inventory: cmap_instances_02.xls (1.6 MB)
data matrix: rankMatrix.txt.zip (309 MB)
MSigDB (http://www.broadinstitute.org/gsea/msigdb/collections.jsp)
requires registration: yes
install directory: $DATA_STORE/msigdb
data files:
C2 curated gene sets by gene symbols: c2.cp.kegg.v4.0.symbols.gmt (87.6 KB)
C2 curated gene sets by entrez ids: c2.cp.kegg.v4.0.entrez.gmt (87.7 KB)
Cancer Cell Line Encyclopedia (http://www.broadinstitute.org/ccle/data/browseData)
requires registration: yes
install directory: $DATA_STORE/ccle
data files:
mRNA expression: CCLE_Expression_Entrez_2012-09-29.gct (167.2 MB)
Cell Line Annotations: CCLE_sample_info_file_2012-10-18.txt (196 KB)
Oncomap mutations: CCLE_Oncomap3_2012-04-09.maf (318 KB)
Hybrid capture sequencing mutations: CCLE_hybrid_capture1650_hg19_NoCommonSNPs_NoNeutralVariants_CDS_2012.05.07.maf (56.5 MB)
Below are drug annotation resources that are automatically downloaded by cidd. If you use the drug annotations, please cite the following resources. These websites make these data sources freely downloadable without the need for user registration. See the specific websites for inquiries regarding non-academic use.
DrugBank (http://www.drugbank.ca): free for non-commercial uses; please visit their website for commercial license inquiries.
Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S, Dame ZT, Han B, Zhou Y, Wishart DS. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 2014 Jan 1;42(1):D1091-7.
MATADOR (http://matador.embl.de): free for non-commercial uses; please visit their website for commercial license inquiries.
Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A, Jensen LJ, Schneider R, Skoblo R, Russell RB, Bourne PE, Bork P, Preissner R. SuperTarget and Matador: resources for exploring drug-target relationships. Nucleic Acids Res. 2008 Jan;36(Database issue):D919-22.
KEGG Medicus (ftp://ftp.genome.jp/pub/kegg/medicus/): free for academic users at the GenomeNet FTP site; please visit http://www.kegg.jp/kegg/download for non-academic users. KEGG Medicus is a subset of KEGG. Any other KEGG data (besides KEGG Medicus) requires a data subscription. See http://www.kegg.jp/kegg/download (KEGG FTP Academic Subscription) if interested in these data sets.
Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M.; Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42, D199–D205 (2014).
Kanehisa, M. and Goto, S.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30 (2000).
Start an R console and install the following packages:
mirror = "http://cran.us.r-project.org"
install.packages("graphics", repos=mirror)
install.packages("amap", repos=mirror)
install.packages("gplots", repos=mirror)
install.packages("data.table", repos=mirror)
install.packages("snowfall", repos=mirror)
source("http://bioconductor.org/biocLite.R")
biocLite("edgeR", dependencies=TRUE)
biocLite("piano", dependencies=TRUE)
biocLite("ktspair", dependencies=TRUE)
Quickstart example
In this example, we will setup a cidd data store, create a gene expression signature and identify candidate drugs for BRAF V600E colorectal cancer. These commands should all be run in the same directory. By default, a data store will be created in this directory, and by default, cidd commands will look for a local data_store directory in the same location where you run cidd commands. Output for the below commands are placed in a project directory with the project name specified in the cidd setup command (i.e., crc_brafv600e). A log file for the below commands will be created crc_brafv600e/crc_brafv600e.log (the -v2 verbosity parameter displays highly verbose log messages - you can remove this parameter if you want to minimize the console output) and output reports of the below commands can be found at crc_brafv600e/reports. The prefix for output reports use the -n (or analysis name) parameter in the below commands (these just happen to be the same value as the project name in this simple example).
# [1] check if data dependencies have been installed in expected locations;
# manually install any required dependencies that might be missing;
# these should have already been downloaded in the installation step above
cidd check
# [2] download and install colorectal cancer data
cidd setup -c coadread \
--data_run_date 2014_07_15 \
--analysis_run_date 2014_07_15 \
crc_brafv600e -v2
# [3] generate a gene expression signature
cidd mutation_signature \
--data_run_date 2014_07_15 \
--analysis_run_date 2014_07_15 \
-c coadread \
-g BRAF \
-aac V600E \
--gsa_num_perm 100 \
--gsa_num_cpus 20 \
-lfc 2 -n crc_brafv600e -v2
# [4] identify candidate drugs
cidd drugs \
-np 100 \
-nt 20 \
-n crc_brafv600e -v2
# [5] generate a gene expression classifier
cidd classifier generate \
-n crc_brafv600e -v2
# [6] identify candidate cell lines
cidd cell_lines \
-g BRAF \
-aac V600E \
-t LARGE_INTESTINE \
-n crc_brafv600e -v2
cidd check -h
usage: cidd check [-h] [-d DATA_STORE] [-v {0,1,2}]
This will check for a data store in the specified location. If one doesn't
exist, it will create an empty one for manual population. If resources are
missing, please install them manually and then run check again until it
succeeds before proceeding with running cidd analyses.
optional arguments:
-h, --help show this help message and exit
-d DATA_STORE, --data_store DATA_STORE
name of directory where data resources are, or will
be, stored. Defaults to environment variable
$DATA_STORE.
-v {0,1,2}, --verbosity {0,1,2}
output error and warning (0), info (1) and debug (2)
information to standard output (default to 1)
If a cidd project (a project_name.cidd file and a project_name folder) does not exist in the current directory, this command will create one. The command also can be used to download necessary TCGA data for a given TCGA project. For example, specifying coadread for the --cohort parameter will result in cidd downloading clinical, gene expression microarray, rna-seq and mutation data for the TCGA colorectal cancer project into a TCGA directory in your local data store.
cidd setup -h
usage: cidd setup [-h] -c COHORT [-ar ANALYSIS_RUN_DATE] [-dr DATA_RUN_DATE]
[-f] [-d DATA_STORE] [-v {0,1,2}]
project
This will create a project directory for storing project artifacts like
expression signatures, classifiers, reports, etc. It will also create a
data_store directory if one doesn't exist already with an appropriate data
structure for cidd projects. If a project directory already exists, the
command simply adds data to the existing directory.
positional arguments:
project name of a new project
optional arguments:
-h, --help show this help message and exit
-c COHORT, --cohort COHORT
disease cohort to setup data for (for a list of
possible disease cohorts run "tcga_util desc cohorts".
-ar ANALYSIS_RUN_DATE, --analysis_run_date ANALYSIS_RUN_DATE
run date for analyses to describe (defaults to
"latest")
-dr DATA_RUN_DATE, --data_run_date DATA_RUN_DATE
run date for data to describe (defaults to "latest")
-f, --force force replace a project if it already exists
-d DATA_STORE, --data_store DATA_STORE
name of directory where data resources are, or will
be, stored. Defaults to environment variable
$DATA_STORE.
-v {0,1,2}, --verbosity {0,1,2}
output error and warning (0), info (1) and debug (2)
information to standard output (default to 1)
cidd signature -h
usage: cidd signature [-h] [-ar ANALYSIS_RUN_DATE] [-dr DATA_RUN_DATE] -c
COHORT
[-et {rnaseq,rnaseq_illuminaga,rnaseq_illuminahiseq,agilent}]
[--cases CASES] [--controls CONTROLS]
[-cg CANDIDATE_GENES] -n NAME
[-lcm {euclidean,maximum,manhattan,canberra,binary,pearson,abspearson,correlation,abscorrelation,spearman,kendall}]
[-lam {none,BH,BY,holm}] [-lp LIMMA_ADJ_PVAL_THRESH]
[-lfc LIMMA_FC_THRESH] [-lperm LIMMA_PERMUTATIONS]
[-gsa {fisher,stouffer,reporter,tailStrength,wilcoxon,mean,median,sum,maxmean,gsea,page}]
[-gsm {geneSampling,samplePermutation}]
[-gam {holm,hochberg,hommel,bonferroni,BH,BY,fdr,none}]
[-ggp GSA_GSEA_PARAM] [-gperm GSA_NUM_PERM]
[-gnc GSA_NUM_CPUS] [-gs GENE_SETS] [-d DATA_STORE]
[-v {0,1,2}]
This command generates an expression signature that represents a class of
samples. In addition, a classifier will be generated to be used in subsequent
class prediction analyses. A heatmap illustrating clustering of samples using
the signature can also be generated.
optional arguments:
-h, --help show this help message and exit
-ar ANALYSIS_RUN_DATE, --analysis_run_date ANALYSIS_RUN_DATE
run date for analyses to describe (defaults to
"latest")
-dr DATA_RUN_DATE, --data_run_date DATA_RUN_DATE
run date for data to describe (defaults to "latest")
-c COHORT, --cohort COHORT
disease cohort to setup data for (for a list of
possible disease cohorts run "tcga_util desc cohorts"
-et {rnaseq,rnaseq_illuminaga,rnaseq_illuminahiseq,agilent}, --expression_type {rnaseq,rnaseq_illuminaga,rnaseq_illuminahiseq,agilent}
the TCGA data type to be analyzed. By default,
"rnaseq" is selected and the platform (IlluminaGA or
IlluminaHiSeq) that provides the most case samples is
selected for analysis.
--cases CASES name of collection or file with case patient or sample
IDs
--controls CONTROLS name of collection or file with control patient or
sample IDs
-cg CANDIDATE_GENES, --candidate_genes CANDIDATE_GENES
filename containing a list of genes to limit the
signature to (e.g., a set of pathway genes or a set of
genes with some prior evidence suggesting that they
are related to the phenotype of interest, etc). By
default, all genes are considered for inclusion in the
signature.
-n NAME, --name NAME name of signature - used to prefix output filenames
-lcm {euclidean,maximum,manhattan,canberra,binary,pearson,abspearson,correlation,abscorrelation,spearman,kendall}, --limma_clust_method {euclidean,maximum,manhattan,canberra,binary,pearson,abspearson,correlation,abscorrelation,spearman,kendall}
hierarchical clustering distance method to be used
with the R function hcluster {amap}.
-lam {none,BH,BY,holm}, --limma_adjust_method {none,BH,BY,holm}
method used to adjust the differential expression
p-values for multiple testing using the R function
toptable {limma}. Options, in increasing conservatism,
include "none", "BH", "BY" and "holm"
-lp LIMMA_ADJ_PVAL_THRESH, --limma_adj_pval_thresh LIMMA_ADJ_PVAL_THRESH
adjusted p-value threshold at which to define
differentially expressed genes for inclusion in the
gene signature
-lfc LIMMA_FC_THRESH, --limma_fc_thresh LIMMA_FC_THRESH
fold change threshold at which to define
differentially expressed genes for inclusion in the
gene signature
-lperm LIMMA_PERMUTATIONS, --limma_permutations LIMMA_PERMUTATIONS
number of results to generate with permuted sample
labels (these results can be used downstream for
assessing gene set analysis significance)
-gsa {fisher,stouffer,reporter,tailStrength,wilcoxon,mean,median,sum,maxmean,gsea,page}, --gsa_stat {fisher,stouffer,reporter,tailStrength,wilcoxon,mean,median,sum,maxmean,gsea,page}
statistical gene set method to use to identify gene
sets associated with cases using the R function runGSA
{piano}.
-gsm {geneSampling,samplePermutation}, --gsa_sig_method {geneSampling,samplePermutation}
the method for significance assessment of gene sets as
defined by the R function runGSA {piano}. geneSampling
permutes gene labels and samplePermutation permutes
sample status labels
-gam {holm,hochberg,hommel,bonferroni,BH,BY,fdr,none}, --gsa_adj_method {holm,hochberg,hommel,bonferroni,BH,BY,fdr,none}
the method for adjusting for multiple testing. Can be
any of the methods supported by p.adjust, i.e. "holm",
"hochberg", "hommel", "bonferroni", "BH", "BY", "fdr"
or "none". The exception is for --gsa_stat=gsea, where
only the options "fdr" and "none" can be used.
-ggp GSA_GSEA_PARAM, --gsa_gsea_param GSA_GSEA_PARAM
parameter as defined by gsea - recommended to be 1 by
http://www.broadinstitute.org/gsea/index.jsp. This
parameter is only used if the "gsa_sig_method" is
"gsea"
-gperm GSA_NUM_PERM, --gsa_num_perm GSA_NUM_PERM
number permutations for assessing significance of gene
set associations with the case status
-gnc GSA_NUM_CPUS, --gsa_num_cpus GSA_NUM_CPUS
number of cpus available for gene set analyses
-gs GENE_SETS, --gene_sets GENE_SETS
gene sets to use for gene set analyses
-d DATA_STORE, --data_store DATA_STORE
name of directory where data resources are, or will
be, stored. Defaults to environment variable
$DATA_STORE.
-v {0,1,2}, --verbosity {0,1,2}
output error and warning (0), info (1) and debug (2)
information to standard output (default to 1)
cidd drugs -h
usage: cidd drugs [-h] [--up UP] [--down DOWN] [--rank_matrix RANK_MATRIX]
[--instances INSTANCES]
[--background_enrichment_scores BACKGROUND_ENRICHMENT_SCORES]
[-np NUM_PERMUTATIONS] [-nt NUM_THREADS]
[-cg CANDIDATE_GENES] -n NAME [-d DATA_STORE] [-v {0,1,2}]
This command identifies candidate drugs that, when compared to the provided
gene expression signature, induces a complementary gene expression signature
on cell lines.
optional arguments:
-h, --help show this help message and exit
--up UP a list of up regulated genes (Entrez IDs or gene symbols)
--down DOWN a list of down regulated genes (Entrez IDs or gene symbols)
--rank_matrix RANK_MATRIX
matrix for perturbagen instance effects on cell line
gene expression ranks (defaults to an Entrez gene
version of the CMAP rank matrix)
--instances INSTANCES
details for the perturbagen instances represented in
the rank matrix (defaults to instance details provided
by the CMAP)
--background_enrichment_scores BACKGROUND_ENRICHMENT_SCORES
list of enrichment scores obtained by applying random
gene signatures to the rank matrix - used to calculate
an empirical p-value for the enrichment scores of user
signatures (defaults to scores obtained by applying
MSigDB gene signatures to the CMAP rank matrix)
-np NUM_PERMUTATIONS, --num_permutations NUM_PERMUTATIONS
number permutations for assessing significance of gene
set associations with the case status
-nt NUM_THREADS, --num_threads NUM_THREADS
number of threads for parallel processing
-cg CANDIDATE_GENES, --candidate_genes CANDIDATE_GENES
filename containing a list of genes to limit the
signature to (e.g., a set of genes with some prior
evidence suggesting that they are differentially
expressed between the classes of interest). By
default, all signature genes are used in the drug
search.
-n NAME, --name NAME name of analysis - used to prefix output files
-d DATA_STORE, --data_store DATA_STORE
name of directory where data resources are, or will
be, stored. Defaults to environment variable
$DATA_STORE.
-v {0,1,2}, --verbosity {0,1,2}
output error and warning (0), info (1) and debug (2)
information to standard output (default to 1)
By default, the cidd drugs command will use the output of the cidd signature command and screen that signature against drug-induced signatures provided by the connectivity map. If you have your own list of up- and down-regulated genes, you can explicitly specify them (without running a cidd signature command) in the cidd drugs command with the --up and --down parameters. If you did not download any TCGA data and have not run a cidd setup command, this command will fail. A cidd project should be established first.
cidd setup project_name
Then an empty cidd project called project_name will be created in the local directory and cidd drugs reports will be generated in the file structure of this project.
cidd cell_lines -h
usage: cidd cell_lines [-h] [-t TISSUE] [-r RUN_DATE] [-g GENES]
[-aac AMINO_ACID_CHANGES] [-cod CODONS]
[-vc VARIANT_CLASSIFICATIONS] [-cg CANDIDATE_GENES]
[-c CLASSIFIER] -n NAME [-d DATA_STORE] [-v {0,1,2}]
This command queries ccle data to try to identify cell lines that are most
similar to the samples used to generate a signature.
optional arguments:
-h, --help show this help message and exit
-t TISSUE, --tissue TISSUE
cell line tissue of interest
-r RUN_DATE, --run_date RUN_DATE
run date for data to describe (defaults to "latest")
-g GENES, --genes GENES
A quoted list of genes that should contain the
mutations to retrieve (e.g., "BRCA2|BRAF|KRAS"). If a
sample has mutations in any of these, genes, they will
be reported.
-aac AMINO_ACID_CHANGES, --amino_acid_changes AMINO_ACID_CHANGES
A quoted list of amino acid substitutions to search
for (e.g., "V600E|G12D"). Each substitution will be
searched for within all genes specified through
--genes. If a sample has one of these amino acid
changes in any of the genes (in --genes), that sample
will be reported.
-cod CODONS, --codons CODONS
A quoted list of codon numbers of mutations to search
for (e.g., "12|13|146"). Each substitution will be
searched for within all genes specified through
--genes.
-vc VARIANT_CLASSIFICATIONS, --variant_classifications VARIANT_CLASSIFICATIONS
A quoted list of classifications to search for (e.g.,
"Missense_Mutation|Nonstop_Mutation"). Each will be
searched for within all genes specified through
--genes. Possible values for -vc include: 3'UTR,
5'Flank, 5'UTR, De_novo_Start_InFrame,
De_novo_Start_OutOfFrame, Frame_Shift_Del,
Frame_Shift_Ins, In_Frame_Del,In_Frame_Ins, Intron,
Missense_Mutation, Nonsense_Mutation,
Nonstop_Mutation, RNA, Silent, Splice_Site and
Translation_Start_Site
-cg CANDIDATE_GENES, --candidate_genes CANDIDATE_GENES
filename containing a list of genes to limit the
signature to (e.g., a set of pathway genes or a set of
genes with some prior evidence suggesting that they
are related to the phenotype of interest, etc). By
default, all genes are considered for inclusion in the
signature.
-c CLASSIFIER, --classifier CLASSIFIER
a signature to identify candidate for
-n NAME, --name NAME name of analysis - used to prefix output files
-d DATA_STORE, --data_store DATA_STORE
name of directory where data resources are, or will
be, stored. Defaults to environment variable
$DATA_STORE.
-v {0,1,2}, --verbosity {0,1,2}
output error and warning (0), info (1) and debug (2)
information to standard output (default to 1)
Check the available data for colorectal cancer (e.g., the coadread project):
tcga_util desc data -c coadread
running: firehose_get -tasks stddata latest coadread
Clinical_Pick_Tier1
Merge_Clinical
Merge_cna__illuminahiseq_dnaseqc__hms_harvard_edu__Level_3__segmentation__seg
Merge_methylation__humanmethylation27__jhu_usc_edu__Level_3__within_bioassay_data_set_function__data
Merge_methylation__humanmethylation450__jhu_usc_edu__Level_3__within_bioassay_data_set_function__data
Merge_mirnaseq__illuminaga_mirnaseq__bcgsc_ca__Level_3__miR_gene_expression__data
Merge_mirnaseq__illuminaga_mirnaseq__bcgsc_ca__Level_3__miR_isoform_expression__data
Merge_mirnaseq__illuminahiseq_mirnaseq__bcgsc_ca__Level_3__miR_gene_expression__data
Merge_mirnaseq__illuminahiseq_mirnaseq__bcgsc_ca__Level_3__miR_isoform_expression__data
Merge_protein_exp__mda_rppa_core__mdanderson_org__Level_3__protein_normalization__data
Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__exon_quantification__data
Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__junction_quantification__data
Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data
Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__RSEM_genes__data
Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__RSEM_isoforms_normalized__data
Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__exon_quantification__data
Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__junction_quantification__data
Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data
Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes__data
Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_isoforms_normalized__data
Merge_rnaseq__illuminaga_rnaseq__unc_edu__Level_3__exon_expression__data
Merge_rnaseq__illuminaga_rnaseq__unc_edu__Level_3__gene_expression__data
Merge_rnaseq__illuminaga_rnaseq__unc_edu__Level_3__splice_junction_expression__data
Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_hg18__seg
Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_hg19__seg
Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg18__seg
Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg19__seg
Merge_transcriptome__agilentg4502a_07_3__unc_edu__Level_3__unc_lowess_normalization_gene_level__data
Methylation_Preprocess
miRseq_Mature_Preprocess
miRseq_Preprocess
mRNAseq_Preprocess
mRNA_Preprocess_Median
Mutation_Packager_Calls
Mutation_Packager_Coverage
RPPA_AnnotateWithGene
Check the available analysis data:
tcga_util desc analyses -c coadread
running: firehose_get -tasks analyses latest coadread
Aggregate_Molecular_Subtype_Clusters
CopyNumberLowPass_Gistic2
CopyNumber_Clustering_CNMF
CopyNumber_Clustering_CNMF_thresholded
CopyNumber_Gistic2
Correlate_Clinical_vs_CopyNumber_Arm
Correlate_Clinical_vs_CopyNumber_Focal
Correlate_Clinical_vs_Methylation
Correlate_Clinical_vs_miRseq
Correlate_Clinical_vs_Molecular_Subtypes
Correlate_Clinical_vs_mRNA
Correlate_Clinical_vs_mRNAseq
Correlate_Clinical_vs_Mutation
Correlate_Clinical_vs_RPPA
Correlate_CopyNumber_vs_mRNA
Correlate_CopyNumber_vs_mRNAseq
Correlate_Methylation_vs_mRNA
Correlate_molecularSubtype_vs_CopyNumber_Arm
Correlate_molecularSubtype_vs_CopyNumber_Focal
Correlate_molecularSubtype_vs_Mutation
Methylation_Clustering_CNMF
miRseq_Clustering_CNMF
miRseq_Clustering_Consensus
miRseq_Mature_Clustering_CNMF
miRseq_Mature_Clustering_Consensus
mRNAseq_Clustering_CNMF
mRNAseq_Clustering_Consensus
mRNA_Clustering_CNMF
mRNA_Clustering_Consensus
Mutation_Assessor
Mutation_CHASM
MutSigNozzleReport1
MutSigNozzleReport2
MutSigNozzleReportCV
MutSigNozzleReportMerged
Pathway_FindEnrichedGenes
Pathway_Hotnet
Pathway_Paradigm_mRNA
Pathway_Paradigm_mRNA_And_Copy_Number
Pathway_Paradigm_RNASeq
Pathway_Paradigm_RNASeq_And_Copy_Number
RPPA_Clustering_CNMF
RPPA_Clustering_Consensus
cidd check
cidd setup -c coadread crc_brafv600e -v2
You can specify mutations explicitly like in the below example:
cidd mutation_signature -c coadread -g BRAF -aac V600E --gsa_num_perm 100 --gsa_num_cpus 20 -lfc 2 -n crc_brafv600e -v2
This method is useful for identifying signatures that might be associated with clinical data or non-mutation molecular data. In such cases, you can identify your own lists of case and control samples (from your own analyses) and run the cidd signature command. In this example, we repeat the BRAF V600E signature except we specify this list of case and control ids explicitly. In this example, these lists were generated in the cidd mutation_signature described previously.
cidd signature -c coadread --cases crc_brafv600e/reports/crc_brafv600e_cases.samples --controls crc_brafv600e/reports/crc_brafv600e_controls.samples --gsa_num_perm 100 --gsa_num_cpus 20 -lfc 2 -n crc_brafv600e -v2
In this command, we've identified our own gene expression signature external of cidd. A list of up-regulated entrez gene IDs and a list of down-regulated entrez gene IDs are input to cidd.
cidd drugs --up entrez_ids.up --down entrez_ids.down -n test_analysis -v2
This can be done by simply specifying the analysis name used when generating the signature. cidd will automatically retrieve the gene expression signature.
cidd drugs -np 100 -n crc_brafv600e -v2
After generating a gene expression signature in cidd, you can generate a kTSP (k-Top Scoring Pairs) classifier using these signature genes. This classifier is needed if you want to identify candidate cell lines that resemble your tumor of interest.
cidd classifier generate -n crc_brafv600e -v2
In this command, cell lines will be retrieved to represent tumors characterized by cidd. This command tells cidd to use the gene expression classifier generated by cidd in the analysis named crc_brafv600e. cidd will use the classifier generated by cidd. Additionally, large intestine cell lines are filtered for and the candidate cell lines are required to have a BRAF V600E mutation.
cidd cell_lines -g BRAF -aac V600E -t LARGE_INTESTINE -n crc_brafv600e