Software

Software Releases: The computational tools we have developed are available for download and free use by the academic community. 

KNOWENG (Knowledge Engine for Genomics)

This is the Cloud-based software for genomics spreadsheet analysis that was created by the KnowEnG BD2K Center for Excellence, led by Sinha and others. Here you may perform a variety of tasks such as gene prioritization, sample clustering, gene set characterization and signature analysis, using standard tools as well as novel tools we developed for 'knowledge-guided' analysis, where prior knowledge in a number of databases is utilized as part of the analysis of your data. Try it out ! 

GEMSTAT (Thermodynamics-based modeling of gene expression from regulatory sequences)

This Linux-based program is meant for sequence-to-expression modeling. The inputs to this modeling software include the DNA sequence(s) of one or more enhancers, their expression readouts in multiple conditions, as well as data on relevant transcription factors (TFs): their concentration levels in those conditions and their binding motifs. The model then learns to explain the gene expression driven by an enhancer as a function of its sequence and the TF data. 

STAP (Motif-based prediction of transcription factor binding in a sequence window)

A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data.

SCRMshaw (Genome-wide Supervised CRM Prediction)

SCRMshaw is a genome-wide CRM prediction program that learns parameters from a list of CRMs that regulates the same spatial/temporal gene expression pattern, and predicts CRMs with similar functionality genome-wide. The SCRMshaw program is designed for predicting CRMs in the genome sequence of model/non-model species given the known CRMs provided by users. Users can also choose three different scoring schemes to predict CRMs: IMM, PAC and HexMCD.

METALYSIS and CIS-METALYSIS  (Meta analysis of gene expression data sets)

This Linux-based program is meant for revealing higher level insights from multiple gene expression data sets. In particular, if you have up- and down-regulated gene sets from several different conditions and want to know what might be common to those different gene sets, you can use the Metalysis program. A special version of this program, called "cis-Metalysis" identifies cis-elements (TF binding sites) common to those gene sets.

SWAN (Prediction of binding targets of a transcription factor, characterized by a position weight matrix)

This Linux-based program is meant for genome-wide prediction of regulatory targets of a motif using a Hidden Markov Model. It differs from Stubb in that instead of asking Does the sequence have more sites than expected from a random (background) model of sequences?, it asks the question Does the sequence have more sites than the average genome-wide frequency of sites? We have found this new approach to lead to more accurate motif target predictions overall.

CRM discovery benchmark (Data sets from D. melanogaster.) 

Morph software (Probabilistic alignment of cis-regulatory modules) 

D2Z software (Alignment free comparison of regulatory sequences.) 

Indelign software (Probabilistically annotating indels in multiple alignments) 

DIPS software (For finding discriminative PWM motifs) 

Stubb software (For finding cis-regulatory modules) 

PhyME software (Motif finding in orthologous sequences) 

Retired software

MET (Motif Enrichment Tool)

This web interface takes user-defined gene sets and identifies significantly associated sets of genes that share a regulatory motif for a dozen model organisms.

Genome Surveyor (Prediction of motif targets in D. melanogaster)

This web-based Genome Browser allows you to find regulatory targets of a large collection of transcription factors in the Drosophila genome. You may use cross-species comparison among 12 genomes to see conserved targets.

EMMA (Prediction and alignment of cis-regulatory modules)

This Linux-based program is meant for prediction of regulatory targets of a motif using two-species comparison. If you have a sequence window of length ~100 bp - 2000 bp, and its orthologous window from another species, use EMMA to score the window for matches to a given motif. EMMA is also useful for alignment of cis-regulatory modules (enhancers) between two species, if you have knowledge of the relevant transcription factor motifs.

YMF (Motif Finding)