In my PhD thesis I address the multi-body assembly problem. Given the structures of n components of a complex, the objective of the problem is to predict the relative positions of the components so that the assembled complex is formed. This problem is a generalization of many problems in different problem domains. For example, 3D model construction based on spin images, stitching multiple images, reconstructing historic artifacts from broken fragments, modeling protein-protein and protein-RNA complexes, predicting the assembly of cell membranes, viral capsids etc.

This thesis includes the several sub-problems, some of which I have addressed in the past-

Molecular Modeling: I have developed efficient data structures to maintain molecular models. A molecule is usually defined as a collection of atoms in 3D. If the atoms of the molecule are assumed to be hard spheres with fixed radius, then it is called the Union of Balls (UofB) model. However, the surface of the UofB has singularities and hence is not suitable for many numerical computations. A smooth surface can be generated from the UofB by rolling a ball over the UofB surface. This smooth surface is called the solvent excluded surface (SES) if the rolling ball has a radius equal to the solvent atoms/molecules. My dynamic packing grid (DPG) data structure can represent both UofB and SES, and maintain them under dynamic movements of the atoms. DPG allows addition, deletion and movement of atoms in O(lgw) time and supports O(lglgw) time neighborhood queries with high probability. I have also used DPG to augment dynamic Octree data structures to maintain the models at multiple resolutions. DPG is available here.

Protein-protein docking: The docking problem is defined as the prediction of the structure of a complex formed by two proteins, where the input is the structure of the proteins in isolation. This problem can be translated into a geometric search and optimization problem. Part of the problem is to design an accurate scoring function which can discriminate between correct and incorrect structures of a complex. Given such a function, the problem reduces to searching over the space of relative orientations of the two proteins and rank and report the relevant orientations. This exhaustive search over the R^6 spave was sped up using parallel programming and FFT based convolutions. The scoring function itself was efficiently evaluated using the DPG and the dynamic octree data structures. The software (F2Dock) is available here. It was recently published in PLOS one. Another paper describing the machine learning techniques used to optimize the scoring models is under preparation.

Analysis of Protein-Protein Interfaces: I have developed efficient algorithms to compute different properties of specific protein-protein interfaces. Given two proteins and their relative positions, my algorithms can compute the interface area, interface planarity, interface circularity, interface width, the residue-residue contacts on the interface, the hydrogen bonds etc in O(lg m) time where m is the number of atoms on the interface. The algorithms is available as part of the MolSurf package.

Quantitative and interactive visualization of docking results: I have developed a client-server software with a visual front-end to easily perform protein-protein docking using F2Dock. The software allows the user to visually manipulate the different parameters and inputs and submit a remote job. The server resides on a cluster in our lab and can process the docking efficiently. The results are automatically sent back to client when it is ready. The front end also allows the user to browse the different docking poses while comparing different scores, energies and interface detail and at the same time visualize an openGL rendering of the proteins in the predicted orientations. This software (TexMol) is available here (or view this short demo video). A webserver and client with similar features are under construction.

Identity Risk Management: I have also been working with the Center for Identity at the University of Texas at Austin to develop a framework to model, manage and predict identity risks. We define the set of attributes related to an individual's or business's identity and their relationships using a graphical model. Two nodes in the model are connected if the exposure of one attribute can cause or co-occur with the exposure of the other. The conditional probability distributions as well as the prior probabilities for exposure are learned based on data collected from organizations dealing with identity management. I have also developed a 3D visualization software to display, interact and interrogate the network. A video demo is available here.