Research

High throughput morphological profiling of cells

I am currently developing methods for image analysis in the context of high throughput morphological profiling of cells 

Publications

DNA copy number estimation using machine learning

DNA copy number variations (CNV) have been shown to correlate with susceptibility to infectious diseases and other disorders. Identifying CNVs in NGS data is tricky because of the noise that is introduced during the various stages of sequencing. Targeted sequencing based on Molecular Inversion Probes can mitigate some of this noise but it is still a challenge to identify patterns in sequencing data that correspond to CNVs.

I developed an algorithm based on several smoothing, clustering and machine learning algorithms like LULU, fast search and t-SNE to remove the noise in the sequencing data. The algorithm also analyzes multiple patient samples concurrently which aides in resolving the structural features at a high level. Using Gaussian mixture models in conjunction with Bayesian Inference, the algorithm can identify the chromosomal copy state that gives rise to the genomic copy state observed in the data.

Protein conformational change pathway

Protein conformational changes are crucial to their function. Many protein undergo large conformational changes when they interact with ligands or other macromolecules. Traditionally these conformational changes are studied using computationally expensive molecular dynamics simulations which are not optimal for high throughput studies. Hence methods like MinActionPath were developed which can simulate large scale conformational changes rapidly. MinActionPath assumes a simplified potential like Anisotropic Network Model and then identifies the most probable path that connects the two conformations of a macromolecule. But it relies on several parameters that it cannot estimate in an obvious way.

We developed PyPath on top of MinActionPath, which estimates the optimal operational parameters directly from the system that it simulates. It also efficiently identifies the transition state that occurs at the maxima of the energy surface that connects the two stable conformations. 

Even though PyPath uses an approximation of the energy surface by using a simplified energy function, the estimates for the rate and energy parameters that it calculates are highly correlated with experimental results. We tested the algorithm by designing several mutants for Trypophanyl-tRNA synthetase and estimated the rate of conformational change and the associated free energy change. These values correlated with experimental kinetic and thermodynamic data with a correlation coefficient of over 0.9.

The PyPath program is available to download on Github.

Publications

Aminoacyl-tRNA synthetase evolution

Rodin-Ohno hypothesis states that the two classes of aminoacyl-tRNA synthetases (aaRS) arose from the opposite strands of the same ancestral gene. They had previously shown that the core of the two classes of enzymes, when lined up in a sense-antisense manner, showed increased complementarity compared to the null hypothesis (25%). We expanded the research by constructing a larger core (94 aminoacids) that is still functional and is possibly an ancestor of the present day enzymes, also called an urzyme

We used Tryptophanyl-tRNA synthetase (TrpRS) as a representative of class I aaRS and Histidyl-tRNA synthetase (HisRS) for class II aaRS. We formulated a scoring function called Middle codon-Base Pairing Frequency (<MBP>) which is the frequency of middle codon base that are complementary when the two sequences are aligned. The score is based only on the middle base because according to the Wobble hypothesis, the third codon-base is the least conserved and since we are aligned the sequences in a sense-antisense manner, both the first and the third codon-bases are not conserved and any information leftover from the ancestral gene is probably present only in the middle base.

We built maximum likelihood phylogenetic trees and also reconstructed the ancestral middle codon-base gene at each node of the tree to show that <MBP> increased as we moved closer to the root of the tree. 

We also found that ancestral bacterial sequences had the largest <MBP> compared to Archaea and Eukarya indicating that bacteria might be closer to the origin of translation compared to the other kingdoms.

Publications

Engineering a kinase biosensor

Kinases are enzymes that transfer the phosphate group from ATP to a substrate in a variety of cellular functions like metabolism, signaling and transport. Thus spatial and temporal control of kinases can provide control over a wide range of cellular functions. Previously, a biosensor based on the FKBP-Rapamycin-FRB system was designed where the FKBP molecule was inserted into a highly dynamic, functionally relevant region of a kinase and the binding of Rapamycin and the subsequent recruitment of FRB changed the dynamics of this region thereby activating/deactivating the kinase. But since the biosensor required the expression of two different proteins to affect the state of the kinase, a more efficient design was desired. We engineered a chimeric biosensor, uniRapR, that combines FKBP and FRB, which is inserted into the kinase. Presence or absence Rapamycin in the cell then controls the state of the kinase. The dynamics of the system was studied in silico using Discrete Molecular Dynamics simulations and the performance of the biosensor was assessed both in vitro and in vivo.

Publications

Dynamics of Lipases

Gastric lipases function in acidic environments while pancreatic lipases do not. Designing acid stable pancreatic lipases can find potential application in the treatment of pancreatic exocrine insufficiency. The goal of this project was to understand the structural features that confer stability to gastric lipases in acidic environment compared to pancreatic lipases. 

By comparing the dynamics of the human pancreatic lipase with dog gastric lipase, we showed that in addition to the well known lid region that is present in both these lipases, another highly dynamic region is present in the gastric lipase which may be the source of its stability. The dynamics of the lipases was studied using long time scale molecular dynamics simulations.

Publications