Research

Data structure for Neighborhood Queries in Dynamic Point Collections

I  developed the ‘Dynamic Packing Grid’ (DPG) data structure. It maintains a collection of weighted points under dynamic updates, providing quasi-constant time insertion, deletion and movement (i.e. updates), and constant time neighborhood queries from arbitrary points. DPG discretizes space into planes, lines and cells  such that each of them can be uniquely indexed using an integer. These entities are stored hierarchically as containment sets. To support efficient neighborhood queries, DPG uses an integer range reporting data structure which can report such ranges in quasi-constant time.

Chandrajit Bajaj, Rezaul Alam Chowdhury and Muhibur Rasheed, "A dynamic data structure for flexible molecular maintenance and informatics", ACM SPM 2009, SIAM/ACM Joint Conference on Geometric and Physical Modeling 2009:259-270. (link)






Efficient Computation of Molecular Properties

Many molecular properties of interest including the electrostatics (Coulombic), van der Waals (Lennard-Jones), Born radius of atoms, Generalized Born approximation of solvation energy etc., involve integrating functions with distance dependent decaying kernels, over a domain represented by a discrete point set. Naive computation of them would take O(m2) where m is the number of points. I showed that due to the decaying nature of the kernels, a hierarchically coarse summation can be performed over a gradually larger neighborhood to produce approximate values with bounded error.


Chandrajit Bajaj, Rezaul Alam Chowdhury and Muhibur Rasheed, A dynamic data structure for flexible molecular maintenance and informatics, Bioinformatics 2011, 27(1):55-62. (link)

Computing Molecular Surface Representations

I designed and implemented an algorithm which adaptively subdivides space to provide higher resolution sampling near the boundary (surface) of a molecule with n atoms in O(n log n) time. It can support any surface approximation as long as it is expressed as a level set of a volumetric function. This includes the van der Waals surface, the solvent accessible surface (SAS), and the solvent excluded surface (SES) all of which requires the evaluation of an analytical signed distance function at each gridpoint, and a faster Gaussian integration based surface approximation. This grid-based approach provides simultaneous maintenance of molecules in atomic, smooth surface and volumetric representations, making it applicable for a wide range of applications. Finally, when an atom is moved (added or removed), the surface can be dynamically and locally updated by our algorithm in O(log n) time and hence for many applications like docking and structure refinement, where only a fraction of the atoms need to be updated frequently, the dynamic algorithm is beneficial.

Muhibur Rasheed, Alex Rand, and Chandrajit Bajaj. Maintaining flexible molecular surfaces using augmented dynamic octrees. Technical Report 14-23, Institute of Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX, USA (link)




Learning to Distinguish Correct and Incorrect Protein-protein Docking
:
The docking problem is defined as the prediction of the structure of a complex formed by two proteins, where the input is the structure of the proteins in isolation. This problem can be translated into a geometric search and optimization problem. Part of the problem is to design an accurate scoring function which can discriminate between correct and incorrect structures of a complex. Given such a function, the problem reduces to searching over the space of relative orientations of the two proteins and rank and report the relevant orientations. This exhaustive search over the R^6 spave was sped up using parallel programming and FFT based convolutions. The scoring function itself was efficiently evaluated using the DPG and the dynamic octree data structures. The software (F2Dock) is available
here.

Rezaul Chowdhury, Muhibur Rasheed, Donald Keidel, Maysam Moussalem, Arthur Olson, Michel Sanner, and Chandrajit Bajaj, Protein-Protein Docking with F2Dock 2.0 and GB-rerank, PLoS One, 2013.

Muhibur
Rasheed, Qiming Yuan, and Chandrajit Bajaj. Learning optimized scoring models for protein-protein docking. Technical Report 14-25, Institute of Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX, USA




Integrative Data-driven Refinement of Multi-protein Complexes with Missing/Variable Domains

Structure of proteins with large variable regions are hard to determine using laboratory techniques. I developed an intergrative modeling technique which uses cryo-electron microscopy (EM) maps (providing coarse but complete model of the molecule/assembly, incomplete but atomic-resolution models and information of binding interactions to produce structures which are complete and high resolution with high confidence.

We applied the technique to model envelope glycoprotein gp120, widely accepted as the Achilles heel of HIV-1, which has several variable regions whose precise structure and configuration have been especially difficult to establish especially in its bound state with CD4 and 17b- a crucial interaction that enables HIV to infect a cell. My protocol produced a model which explains and agrees to all existing imaging data, contains complete atomic models for the variable regions, is stereochemically and energetically favorable, and has the expected binding interfaces with CD4 and 17b as observed in co-crystallized complexes.


Muhibur Rasheed, R. Bettadapura, and Chandrajit Bajaj. Computational refinement and validation protocol for proteins with large variable regions applied to model HIV Env spike in CD4 and 17b bound state. Structure 2015; 23(6):1138-1149. (link, pre-print and supplement)

Muhibur
Rasheed, R. Bettadapura, and Chandrajit Bajaj. X-ray, Cryo-EM, and computationaly predicted protein structures used in integrative modeling of HIV Env glycoprotein gp120 in complex with CD4 and 17b. Elsevier Data in Brief, 2016. DOI: 10.1016/j.dib.2016.01.001


Radhakrishna Bettadapura, Muhibur Rasheed, Antje Volrath, and Chandrajit Bajaj. PF2Fit: Polar fast Fourier alignment of atomistic structures with 3D electron microscopy. PLoS Computational Biology, 2015; 11(10):e1004289. doi: 10.1371/journal.pcbi.1004289. (link)


Enumerating the Space of All Possible Symmetric Polyhedra with Congruent Faces 

The 5 known regular polyhedra have the highest order of 3D symmetries, making them exceptionally attractive templates for (self)-assembly using minimal types of building blocks, from nanocages and virus capsids to large scale constructions like glass domes. However, the 5 polyhedra only represent a small number of possible spherical layouts formed by congruent tiles (faces). I showed that such layouts can be generated by extending the 5 regular polyhedra in a symmetry preserving way. The resulting family remains is face-congruent, edge-congruent and only have at most two types of vertices. This is a rich new class of symmetric polyhedra beyond the well-studied regular, semi-regular and quasi-regular classes and their duals Catalan solids and Johnson solids. Additionally, while this class is infinite, I proved that there exists a unique parametrization using only two integers, which uniquely determines the geometry and topology of any single member of the family.



Muhibur Rasheed and Chandrajit Bajaj, Symmetric and highly congruent tiled meshes for shells. Proceedings of the 24th International Meshing Roundtable 2015, Procedia Engineering, 123:213-225. (link)




Quantifying and Visualizing Uncertainties in Molecular Modeling

In all computational modeling and simulation of a physical process, errors propagate and compound, not only from one atage of the algorithm to the next, but also from the input itself. This is especially significant in molecular biology since the atomic resolution models, acquired through x-ray crystallography is ubiquitously used with the assumption that they are perfect. In truth even such high quality models contain errors/uncertainties that may lead to errors in computationally derived quantities of interest (QOI) (e.g. surface area, solvation energy etc.). We developed two complementary methods for estimating such errors. The first, applies theoretical analysis of the mathematical function that represent the QOI, and provide upper bounds on the error using Azuma-Hoeffding and McDiarmid like bounds. However, such bounds are typically not tight and in practice the errors are much lower. Hence, as a second protocol, we proposed a low discrepancy sampling for a n dimensional space (n = number of atoms), which requires only polynomial numbers of samples. This efficient sampling allowed us to derive approximate PDFs for the QOIs, given the PDFs of the individual sources of uncertainties (atoms). This technique can be used not only the provide a certificate of confidence for a model/QOI, but also used to rank and refine models. We additionally developed intuitive visualizations to display these PDFs and confidences using surface and volume rendering.

Muhibur Rasheed and Chandrajit Bajaj, Uncertainty Quantified Computational Analysis of the Energetics of Virus Capsid Assembly. IEEE conference on Bioinformatics and BioMedicine, 2016 (accepted).
Muhibur Rasheed, Nathan Clement, Abhishek Bhowmick and Chandrajit Bajaj, Statistical Framework for Uncertainty Quantification in Computational Molecular Modeling. ACM Conference on Bioinformatics, Computational Biology and Health Informatics 2016 (link). To appear in a special issue in IEEE Transactions in Computational Biology and Bioinformatics (TCBB).


Predicting Viral Capsid Self-Assembly
Symmetry plays a key role in making a structure amenable to easy (sometimes even spontaneous) assembly using simple and unique building blocks. Such structures are found in small scales in biological viruses and protein nano-cages used in cancer therapy, to macro scales of domes and other masonry structures. We proposed that all such structures have an underlying topology that uniquely maps to one of the almost-regular polyhedra (see above). This has great implications in reducing the computational complexity tasks like predicting the structure of such assemblies given the fundamental blocks, performing stress and normal mode analysis etc.
For example, while the space of possible geometric configuration of n rigid bodies in isomorphic to (R6(n-1)), with the symmetric layout principle the space reduces to only O(nR6), making it possible to analyze such structures on a commodity computer.
I
have successfully applied this procedure predict the structures of known and experimentally reconstructed protein shell structures.



Muhibur Rasheed and Chandrajit Bajaj. Predicting symmetric spherical shell assemblies. Technical Report 14-24, Institute of Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX, USA (link)



Analysis of Protein-Protein Interfaces

I have developed efficient algorithms to compute different properties of specific protein-protein interfaces. Given two proteins and their relative positions, my algorithms can compute the interface area, interface planarity, interface circularity, interface width, the residue-residue contacts on the interface, the hydrogen bonds etc in O(lg m) time where m is the number of atoms on the interface. The algorithms is available as part of the MolSurf package.


Quantitative and interactive visualization of docking results
I have developed a client-server software with a visual front-end to easily perform protein-protein docking using F2Dock. The software allows the user to visually manipulate the different parameters and inputs and submit a remote job. The server resides on a cluster in our lab and can process the docking efficiently. The results are automatically sent back to client when it is ready. The front end also allows the user to browse the different docking poses while comparing different scores, energies and interface detail and at the same time visualize an openGL rendering of the proteins in the predicted orientations. This software (TexMol) is available here. A webserver and client with similar features are under construction.









Bayesian Analysis of Identity Risks/Costs
 
I have also been working with the Center for Identity at the University of Texas at Austin to develop a framework to model, manage and predict identity risks. We define the set of attributes related to an individual's or business's identity and their relationships using a graphical model. Two nodes in the model are connected if the exposure of one attribute can cause or co-occur with the exposure of the other. The conditional probability distributions as well as the prior probabilities for exposure are learned based on data collected from organizations dealing with identity management. I have also developed a 3D visualization software to display, interact and interrogate the network.





Visualization of Functional Relationships of Brain Regions from fMRI data


Working for the Imaging Research Center at UT Austin's Psychology/Neuroscience program, I implemented a python-based software for visualizing and generating publication ready figures for displaying complex interplay between different brain regions, as discovered through fMRI. From a computational perspective, the problem is to visualize annotated graphs, where both the nodes and edges of the graph are annotated with numerical and/or nominal attributes, and the layout of the nodes must satisfy user-defined constraints and quality criterion. The layout and design is very much similar to circos, but is tailored for brain region mapping (left/right hemispheres etc.). The code is available here.

Comments