Data structure for Neighborhood Queries in Dynamic Point Collections
quasi-constant time insertion, deletion and movement (i.e. updates), and constant time neighborhood queries from arbitrary points. DPG discretizes space into planes, lines and cells such that each of them can be uniquely indexed using an integer. These entities are stored hierarchically as containment sets. To support efficient neighborhood queries, DPG uses an integer range reporting data structure which can report such ranges in quasi-constant time.
Chandrajit Bajaj, Rezaul Alam Chowdhury and Muhibur Rasheed, "A dynamic data structure for flexible molecular maintenance and informatics", ACM SPM 2009, SIAM/ACM Joint Conference on Geometric and Physical Modeling 2009:259-270. (link)
Efficient Computation of Molecular Properties
Many molecular properties of interest including the electrostatics (Coulombic), van der Waals (Lennard-Jones), Born radius of atoms, Generalized Born approximation of solvation energy etc., involve integrating functions with distance dependent decaying kernels, over a domain represented by a discrete point set. Naive computation of them would take O(m2) where m is the number of points. I showed that due to
the decaying nature of the kernels, a hierarchically coarse summation can be performed over a gradually
larger neighborhood to produce approximate values with bounded error.
Chandrajit Bajaj, Rezaul Alam Chowdhury and Muhibur Rasheed, A dynamic data structure for flexible molecular maintenance and informatics, Bioinformatics 2011, 27(1):55-62. (link)
I designed and implemented an algorithm which adaptively subdivides space to provide higher resolution sampling near the boundary (surface) of a molecule with n atoms in O(n log n) time. It can support any surface approximation as long as it is expressed as a level set of a volumetric function. This includes the van der Waals surface, the solvent accessible surface (SAS), and the solvent excluded surface (SES) all of which requires the evaluation of an analytical signed distance function at each gridpoint, and a faster Gaussian integration based surface approximation. This grid-based approach provides simultaneous maintenance of molecules in atomic, smooth surface and volumetric representations, making it applicable for a wide range of applications. Finally, when an atom is moved (added or removed), the surface can be dynamically and locally updated by our algorithm in O(log n) time and hence for many applications like docking and structure refinement, where only a fraction of the atoms need to be updated frequently, the dynamic algorithm is beneficial.
Muhibur Rasheed, Alex Rand, and Chandrajit Bajaj. Maintaining flexible molecular surfaces using augmented dynamic octrees. Technical Report 14-23, Institute of Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX, USA (link)
Learning to Distinguish Correct and Incorrect Protein-protein Docking:
Rezaul Chowdhury, Muhibur Rasheed, Donald Keidel, Maysam Moussalem, Arthur Olson, Michel Sanner, and Chandrajit Bajaj, Protein-Protein Docking with F2Dock 2.0 and GB-rerank, PLoS One, 2013.
Muhibur Rasheed, Qiming Yuan, and Chandrajit Bajaj. Learning optimized scoring models for protein-protein docking. Technical Report 14-25, Institute of Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX, USA
Integrative Data-driven Refinement of Multi-protein Complexes with Missing/Variable Domains
Structure of proteins with large variable regions are hard to determine using laboratory techniques. I developed an intergrative modeling technique which uses cryo-electron microscopy (EM) maps (providing coarse but complete model of the molecule/assembly, incomplete but atomic-resolution models and information of binding interactions to produce structures which are complete and high resolution with high confidence.
We applied the technique to model envelope glycoprotein gp120, widely accepted as the Achilles heel of HIV-1, which has several variable regions whose precise structure and configuration have been especially difficult to establish especially in its bound state with CD4 and 17b- a crucial interaction that enables HIV to infect a cell. My protocol produced a model which explains and agrees to all existing imaging data, contains complete atomic models for the variable regions, is stereochemically and energetically favorable, and has the expected binding interfaces with CD4 and 17b as observed in co-crystallized complexes.
Muhibur Rasheed, R. Bettadapura, and Chandrajit Bajaj. Computational refinement and validation protocol for proteins with large variable regions applied to model HIV Env spike in CD4 and 17b bound state. Structure 2015; 23(6):1138-1149. (link, pre-print and supplement)
Muhibur Rasheed, R. Bettadapura, and Chandrajit Bajaj. X-ray, Cryo-EM, and computationaly predicted protein structures used in integrative modeling of HIV Env glycoprotein gp120 in complex with CD4 and 17b. Elsevier Data in Brief, 2016. DOI: 10.1016/j.dib.2016.01.001
Radhakrishna Bettadapura, Muhibur Rasheed, Antje Volrath, and Chandrajit Bajaj. PF2Fit: Polar fast Fourier alignment of atomistic structures with 3D electron microscopy. PLoS Computational Biology, 2015; 11(10):e1004289. doi: 10.1371/journal.pcbi.1004289. (link)
Enumerating the Space of All Possible Symmetric Polyhedra with Congruent Faces
The 5 known regular polyhedra have the highest order of 3D symmetries, making them exceptionally attractive templates for (self)-assembly using minimal types of building blocks, from nanocages and virus capsids to large scale constructions like glass domes. However, the 5 polyhedra only represent a small number of possible spherical layouts formed by congruent tiles (faces). I showed that such layouts can be generated by extending the 5 regular polyhedra in a symmetry preserving way. The resulting family remains is face-congruent, edge-congruent and only have at most two types of vertices. This is a rich new class of symmetric polyhedra beyond the well-studied regular, semi-regular and quasi-regular classes and their duals Catalan solids and Johnson solids. Additionally, while this class is infinite, I proved that there exists a unique parametrization using only two integers, which uniquely determines the geometry and topology of any single member of the family.
Muhibur Rasheed and Chandrajit Bajaj, Symmetric and highly congruent tiled meshes for shells. Proceedings of the 24th International Meshing Roundtable 2015, Procedia Engineering, 123:213-225. (link)
Quantifying and Visualizing Uncertainties in Molecular Modeling
In all computational modeling and simulation of a physical process, errors propagate and compound, not only from one atage of the algorithm to the next, but also from the input itself. This is especially significant in molecular biology since the atomic resolution models, acquired through x-ray crystallography is ubiquitously used with the assumption that they are perfect. In truth even such high quality models contain errors/uncertainties that may lead to errors in computationally derived quantities of interest (QOI) (e.g. surface area, solvation energy etc.). We developed two complementary methods for estimating such errors. The first, applies theoretical analysis of the mathematical function that represent the QOI, and provide upper bounds on the error using Azuma-Hoeffding and McDiarmid like bounds. However, such bounds are typically not tight and in practice the errors are much lower. Hence, as a second protocol, we proposed a low discrepancy sampling for a n dimensional space (n = number of atoms), which requires only polynomial numbers of samples. This efficient sampling allowed us to derive approximate PDFs for the QOIs, given the PDFs of the individual sources of uncertainties (atoms). This technique can be used not only the provide a certificate of confidence for a model/QOI, but also used to rank and refine models. We additionally developed intuitive visualizations to display these PDFs and confidences using surface and volume rendering.
Muhibur Rasheed, Nathan Clement, Abhishek Bhowmick and Chandrajit Bajaj, Quantification and Visualization of Uncertainties in Molecular Modeling (Under review at Bioinformatics). A slightly different manuscript.
Predicting Viral Capsid Self-Assembly
For example, while the space of possible geometric configuration of n rigid bodies in isomorphic to (R6(n-1)), with the symmetric layout principle the space reduces to only O(nR6), making it possible to analyze such structures on a commodity computer. I have successfully applied this procedure predict the structures of known and experimentally reconstructed protein shell structures.
Muhibur Rasheed and Chandrajit Bajaj. Predicting symmetric spherical shell assemblies. Technical Report 14-24, Institute of Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX, USA (link)
Analysis of Protein-Protein Interfaces
Quantitative and interactive visualization of docking results
I have developed a client-server software with a visual front-end to easily perform protein-protein docking using F2Dock. The software allows the user to visually manipulate the different parameters and inputs and submit a remote job. The server resides on a cluster in our lab and can process the docking efficiently. The results are automatically sent back to client when it is ready. The front end also allows the user to browse the different docking poses while comparing different scores, energies and interface detail and at the same time visualize an openGL rendering of the proteins in the predicted orientations. This software (TexMol) is available here. A webserver and client with similar features are under construction.
Bayesian Analysis of Identity Risks/Costs
Visualization of Functional Relationships of Brain Regions from fMRI data
Working for the Imaging Research Center at UT Austin's Psychology/Neuroscience program, I implemented a python-based software for visualizing and generating publication ready figures for displaying complex interplay between different brain regions, as discovered through fMRI. From a computational perspective, the problem is to visualize annotated graphs, where both the nodes and edges of the graph are annotated with numerical and/or nominal attributes, and the layout of the nodes must satisfy user-defined constraints and quality criterion. The layout and design is very much similar to circos, but is tailored for brain region mapping (left/right hemispheres etc.). The code is available here.