Key Interests:
Integrative and Translational Genetics
Rare Disease and Variant Interpretation
Protein Dynamics and Intrinsically Disordered Proteins
Bioinformatics, Computational and Systems Biology
Machine Learning, Scientific Computing and Pattern Recognition
Evolutionary Algorithms and Metaheuristics Application
Data Science, Big Data and Data Mining
Proteome-scale mapping and characterization of amino acid substitutions (missense variants' sites) in 3D structures.
I led the genetic variant to protein structure effort at Broad Institute and developed MISCAST: Missense variant to proteIn StruCture Analysis SuiTe: http://miscast.broadinstitute.org/. MISCAST (v1.0) contains 1,330 human genes, with reported population variants in gnomAD (release 2.1.1) and patient variants in the ClinVar (February, 2019 release) and HGMD (Professional release 2018.4) databases, and >14k protein 3D structures solved in human from the Protein data Bank. Using EMBL-EBI supported SIFTS resource, the missense variants were mapped onto protein 3D structures using an automated pipeline. The amino acid residues of 1,330 genes were annotated with seven structural, physicochemical, and functional features comprised of 40 subtypes, collected from multiple resources spanning DSSP, PDBsum, PhosphoSitePlus, PANTHER, UniProt. For the details, we refer the readers to read the documentation page at MISCAST.
- Flagship paper out in Proceedings of the National Academy of Sciences, Oct 2020, 202002660; DOI: 10.1073/pnas.2002660117
- Web portal paper out in Nucleic Acid Research, May 2020; DOI: https://doi.org/10.1093/nar/gkaa361
- Platform talk at American Society of Human Genetics, 2019
- Invited Young Scientist talk at EMBO Workshop: Synergy of experiment and computation in quantitative systems biology, 2019
- Abstract and poster at Biophysical society, 2019, DOI: 10.1016/j.bpj.2018.11.2266
- Abstract at American Society of Human Genetics, 2018
Identification of burden of human genetic variants and functional features in intrinsically disordered proteins.
Several physicochemical and structural properties of intrinsically disordered protein (IDP) and disordered regions (IDR) are now well-established (e.g. high net-charge, low hydrophobicity). Functional characterization of IDPs and IDRs, however, are challenging due to their highly dynamic character, promiscuous interaction, and coupled folding and binding. Current open source databases enable us to perform data-driven inference of disorder residues functional features, which may link to their role in disease etiology. In this study, I have evaluated the burden of human genetic variations in human IDPs (DisProt) and characterized the IDRs under constraints using function sites of interests from UniProt database.
- Platform talk at Biophysical Society annual meeting, 2020
Identification of mutational hotspots in 3D structures encoded by 109 neurodevelopment disorder (NDD) associated genes.
Here we systematically collected experimentally-solved, modeled and predicted protein structures for NDD-associated genes. we mapped pathogenic variants (ClinVar and HGMD databases) and population variants (gnomAD database) onto 3D structure and identified pathogenic-variant-enriched amino acids (3D-hotspots) for which no patient variant has yet been reported.
- Platform talk at American Society of Human Genetics, 2019
Identification of Peptide Binding-Sites in Peptide-Protein Complex.
- Poster at Biophysical Annual Meeting, 2017
- Abstract at Biophysical Journal, DOI: http://dx.doi.org/10.1016/j.bpj.2016.11.1153
Evolutionary Computing Algorithm for Engineering Applications.
- IEEE article [link]
PSEE: A Novel Feature for Protein Bioinformatics Research.
- GLBIO/CCBC paper, an ISCB conference [pdf]
- PloS One article [pdf]
- Software in C: PSEE [GitHub]
RBF Kernel SVM to identify Intrinsically Disordered Protein.
- ICCIT paper, an IEEE conference [pdf]
- PloS One articles: DisPredict [pdf] and DisPredict2 [pdf]
- CASP 11 proceedings with DisPredict method (page. 215) [pdf]
- LA annual conference on Computational Biology and Bioinformatics[pdf]
- Software in C: DisPredict [GitHub] and DisPredict2 [GitHub]
Regression with Reinforcement Learning to predict Protein Accessible Surface Area and its Applications in Bioinformatics.
- Journal of Theoretical Biology (JTB) larticle: REGAd3p and 3DIGARS2.0 [pdf]
- Journal of Theoretical Biology (JTB) article: 3DIGARS3.0 [pdf]
- Software in C: REGAd3p [GitHub]
Ensemble Learning framework for Protein Secondary Structure Prediction.
- Journal of Theoretical Biology (JTB) article: MetaSSpred [pdf]
- Software in C: MetaSSpred [GitHub]
Evolutionary Algorithms for Reinforcement Learning.
- SWEVO article: hGRGA [pdf]
- IJBIC article: KGA [pdf]
- GECCO paper of hGRGA, an ACM-SIGEVO conference [pdf]
- Tech-Reports: hGRGA [pdf] and AMLGA [pdf]
- Software: KGA [GitHub] and hGRGA [GitHub]
Global Optimization to predict 3D Protein Structure.
- Computational Biology and Chemistry article: MH_GA [pdf]
- CASP 12 proceedings with DisPredict method (page. 180) [pdf]
Hybrid Evolutionary Algorithm to solve Vehicle Routing Problem.
- ICECE paper, an IEEE conference [pdf]
- Swarm and Evolutionary Computation (SWEVO) article [pdf]
Computing Longest Common Palindromic Subsequence.
- IWOCA workshop paper and Book Chapter [link]
- Fundamenta Informaticae article [pdf]
Window-based Selection Strategy to analyze Time Varying Stochastic Data.
- Report [pdf]
- Presentation [ppt]
Ensemble Machine Learning for Robust Image Classification.
- Report [pdf]
Big Data Project: Large-scale Weather Data analysis and visualization.
Under this project, weather data for 50 years are collected from National Climate Center (NCDC)/National Oceanic and Atmospheric Association (NOAA), modeled (with K-mean clustering and ARIMA), analyzed and visualized. The data crawler is written in shell script and the data parser is written in JAVA. The Back end is developed using MySQL and the front end is developed using PHP, HTML, JAVA Script. The visualization is performed with R and Google API.
- Presentation [ppt]
Wireless Vehicular Communication System development.
A protocol for Vehicular Communication and Network System (VCNS) is developed under this project. Using this protocol, moving vehicles and any stationary road side unit can communicate via wireless communication. This has a huge application in controlling traffic congestion and road accidents. This is a wireless transceiver and microcontroller based system and the program for microcontroller is written in AVR C.
- IWCMC conference paper [pdf]
- Presentation [ppt]
Virtual BUET Cafeteria.
This is a graphics project where the full structural modeling of our BUET cafeteria is done using the OpenGL libraries(gl,glu,glut). To make the visualization realistic, we use lighting, shading, texturing. We also include scenegraph and some external components created using Google Sketchup and Adobe Photoshop.
Online Movie Club
In this project, an online Movie Club management system was developed. For this, the system is first analyzed and the requirement is then described using E-R Diagram. Then, the system is implemented using ASP.Net(Front end) and Oracle(back end).
- Presentation [ppt]
Automated Sales and Distribution Systems.
In this project, a complete automated sales and distribution system is developed for RanksTel Bangladesh Ltd. For this, along with requirement analysis, requirement modelling and specifying control all the steps of UML modelling was performed. Then the software for the system is implemented using PHP for front end and MySQL for back end.
- Report [pdf]
Main-Client System.
In this project, an email client system is built for sending and receiving emails. To implement it, J2SE is used. Also, an interface is designed using GUI for writing emails.