|
|
Feature Selection
|
|
Feature
selection is the determination of the genes, proteins, and/or
environmental factors that lead to an individual's increased
susceptibility to disease or other biological endpoint. We are
developing feature selection methods for gene and protein expression
microarrays as well as SNP genotypic data. These approaches involve
heuristic, information theoretic, and machine learning methods for data
mining.
|
   
|
|
| Interaction Network Discovery from Time Series |
|
  
|
One
approach we have taken is to pose the problem in terms of nonlinear
system identification. We developed a hybrid approach to automatically
discover the network topology and parameters of a coupled differential
equation system that describes the kinetics of a system of interacting
biomolecules. This method combines a grammar-based evolutionary
algorithm with a Kalman particle filter. Another approach we are
investigating is an Agent-Based and Genetic Programming method for
modeling time series.
|
| Antibody-Antigen Docking |
|
We
are integrating binding affinity data from antibody somatic mutations
into an improved scoring function for identifying antiviral binding
sites. The eventual goal is to achieve in silico antibody design and
automatic identification of antiviral targets. We are developing these
methods for human antibodies to rotavirus VP6.
|
   
|
| Simulation of Experimental Methods |
|
  
|
For
example, we have used monte-carlo methods to simulate sonication and
PCR for a new high-resolution chromatin immunoprecipitation (ChIP)
assay that can identify protein binding site locations with high
precision. The goal of this work is to use computation to gain insight
into the effect of tuning experimental parameters. Feedback with
computation can accelerate the development of improved experimental
techniques.
|
| Clustering |
|
We
have applied hierarchical and dynamic programming-based clustering
methods to calcium signaling time-series data generated by a novel
nanophysiometer to discover T-cell types based on their signaling
profiles. We are also working on a probabilistic clustering approach
for genetic buffering data and flow cytometry data.
|
   
|
|
|