Research Areas


Feature Selection

Feature selection is the determination of the genes, proteins, and/or environmental factors that lead to an individual's increased susceptibility to disease or other biological endpoint. We are developing feature selection methods for gene and protein expression microarrays as well as SNP genotypic data. These approaches involve heuristic, information theoretic, and machine learning methods for data mining.

 

spacer

Interaction Network Discovery from Time Series

spacer

One approach we have taken is to pose the problem in terms of nonlinear system identification.  We developed a hybrid approach to automatically discover the network topology and parameters of a coupled differential equation system that describes the kinetics of a system of interacting biomolecules.  This method combines a grammar-based evolutionary algorithm with a Kalman particle filter.  Another approach we are investigating is an Agent-Based and Genetic Programming method for modeling time series.

 
Antibody-Antigen Docking

We are integrating binding affinity data from antibody somatic mutations into an improved scoring function for identifying antiviral binding sites. The eventual goal is to achieve in silico antibody design and automatic identification of antiviral targets. We are developing these methods for human antibodies to rotavirus VP6.

 

spacer

Simulation of Experimental Methods

spacer

 

For example, we have used monte-carlo methods to simulate sonication and PCR for a new high-resolution chromatin immunoprecipitation (ChIP) assay that can identify protein binding site locations with high precision. The goal of this work is to use computation to gain insight into the effect of tuning experimental parameters. Feedback with computation can accelerate the development of improved experimental techniques.

 
Clustering

We have applied hierarchical and dynamic programming-based clustering methods to calcium signaling time-series data generated by a novel nanophysiometer to discover T-cell types based on their signaling profiles. We are also working on a probabilistic clustering approach for genetic buffering data and flow cytometry data.

 

spacer