Sazan Mahbub - Research

Protein 3D Representation Learning and Interaction Site Prediction Using Geometric Deep Learning and Protein Language Models (📆 May '20 - Present)

Supervisor: Prof. Md. Shamsuzzoha Bayzid (BUET)

Keywords: Protein-Protein Interaction, 3D Representation Learning, Geometric Deep Learning, Graph Neural Network

We developed EGRET, a novel method for effective protein 3D representation Learning that outperforms the previous state-of-the-art methods for protein-protein interaction site (PPIS) prediction from unbound proteins, a crucial research problem in drugs and vaccines design. (first authored publication in Briefings in Bioinformatics [doi] [github])
Currently we are developing a Siamese architecture (with EGRET backbone) to address the pairwise-PPISP problem, where the unbound structures of both partner proteins are known. To improve 3D molecular representation learning, we are exploring the potential of Hypergraph Convolutional Networks and their variants. In this project, I am mentoring one undergraduate student.
In these projects, I took the leading role by designing and developing the methods, running experiments, and contributing to writing the paper for EGRET.

Missing Data Imputation in Phylogenomic Datasets and Improving Species Tree Estimation (📆 May '20 - April '22)

Supervisor: Prof. Md. Shamsuzzoha Bayzid (BUET)

Keywords: Computational Phylogenomics, Data Imputation, Self-supervised Learning

We developed an automated and specially tailored unsupervised deep learning technique (named QT-GILD), accompanied by cues from natural language processing (NLP), which learns the data distribution in a given set of incomplete gene trees and effectively imputes the dataset by generating a complete set of quartets accordingly. QT-Gild led to two first co-authored publications in RECOMB 2022 conference [doi] [github] and Journal of Computational Biology [doi].
In this project, I took the lead by ideating and developing the framework, contributing to the experiments and writing the paper.

Prediction and Analysis of Alzheimer’s Disease and Related Dementias using Interpretable Machine Learning on Multimodal data (📆 January '21 - Present)

Supervisors: Prof. Md. Shamsuzzoha Bayzid (BUET), Prof. Clara Li (Icahn School of Med), and Dr. Yuichiro Miyaoka (TMIMS)

Keywords: Alzheimer’s Disease, Genomics, Cognitive data, Multimodal Machine Learning, Interpretability

In this project, we are working on developing a machine learning (ML) method for Alzheimer’s Disease prediction and analysis. We are experimenting with both genomic and cognitive features.
We propose using PRS as features, which obtained 72-74% accuracy, resulting in around 17% improvement over the baseline ML models using cognitive features only. Apart from these, we leveraged the widely used gradient-based algorithm for interpretability study -- Grad-Shap -- for interpreting the decision process of our neural network models.
I contributed to this project by developing an interpretable multi-modal ML framework and running the necessary experiments. I am currently voluntarily mentoring two undergraduate students so that they can take the lead in future endeavors of this project.

Protein Structure Prediction by Leveraging Self-Attention Mechanism (📆 May '18 - Present)

Supervisors: Prof. Md. Shamsuzzoha Bayzid (BUET), Prof. Mohammad Saifur Rahman (BUET)

Keywords: Protein Secondary Structure, Deep Learning, Self-Attention

We built a highly accurate method for Q8 protein structure prediction (SAINT), which incorporates self-attention mechanism (a concept from natural language processing) with the Deep Inception-Inside-Inception network in order to effectively capture both the short- and long-range interactions among the amino acid residues. SAINT offers a more interpretable framework than the typical black-box deep neural network methods.
Here, I came up with the design of the framework of SAINT, and led the project by developing the method, contributing in running experiments and writing the paper. (published in Bioinformatics [doi] [github])
Inspired by the success of SAINT, we also developed a protein torsion angle prediction framework, named SAINT-Angle, that achieved state-of-the-art accuracy for this problem. I am also mentoring two undergraduate students in this project. (published in Bioinformatics Advances [doi] [github])

HirePreter: A Framework for Providing Fine-grained Interpretation for Automated Job Interview Analysis (📆 June ’20 - January ’21)

Supervisor: Prof. Ehsan Hoque (University of Rochester)

Keywords: Interpretable Deep Learning, Fairness in AI, Multimodal Learning

We built an ensemble model—by combining Multiple-Instance-Learning and Language-Modeling based models—that can predict whether an interviewee should be hired or not. Using both model-specific and model-agnostic interpretation techniques, we can decipher the most informative time-segments and features driving the model’s decision making. (first co-authored publication in ACII Workshop on Applied Multimodal Affect Recognition 2021 [doi])

Automatic Smile Detection From Speech Data using End-to-end Deep Learning Framework (📆 November ’19 - May ’20)

Supervisor: Prof. Ehsan Hoque (University of Rochester)

Keywords: End-to-end Deep Learning Framework, Smile Detection, Speech Data

Here, I contributed in the dataset preparation and designing a deep learning architecture. (In collaboration with a research group at MIT Media Lab)

Natural Language Processing based Automated Code-Repair conditioning on Code-review (📆 March ’19 - January ’20)

Supervisor: Prof. Anindya Iqbal (BUET)

Keywords: Deep Learning, Natural Language Processing, Automated Code Repair, Code Review

In this project, we leveraged a state-of-the-art text-summarization method and code reviews to develop an automated tool for code-repairing. We also built a large dataset with 55,060 code reviews and associated code changes. (co-authored publication in Information and Software Technology journal [doi] [github])

Automated Textual Abuse Detection in Online Platforms (📆 January ’19 - October ’19)

Supervisor: Prof. A. B. M. Alim Al Islam (BUET)

Keywords: Natural Language Processing, Textual Abuse, Automated System

In this project, we built an automated system for detecting and preventing textual abuse in emails. (co-authored publication in NSysS 2021 conference)

Improving Live-Cell Imaging through Computationally Correcting Meniscus-induced Aberrations (📆 January ’22 - Present)

Supervisor: Prof. Christopher Metzler (UMD)

Keywords: Live-Cell Imaging, Phase-Contrast Microscopy, Physics-based Modeling, Bioimage Informatics, Deep Learning

We aim to develop a computational framework to address the aberrations induced by liquid buffer’s meniscus (meniscus-effect) in live-cell imaging by the widely-used phase-contrast microscopy. (In collaboration with a research group of NIST)
Here, I developed a physics-based (Fourier optics) model and a simulator to generate synthetic datasets under meniscus-effect aberrations. Currently, I am designing a spatially varying deconvolution network that would leverage this dataset. To sidestep the issue of domain-shift (due to combining real and synthetic data), I aim to use adversarial domain-adaptation.

Neural Field based reconstruction from multimodal signals (📆 January ’22 - Present)

Supervisor: Prof. Christopher Metzler (UMD)

Keywords: Neural Fields, Multimodal Learning, Meta Learning, 2D and 3D Reconstruction

We are working on developing neural field based frameworks for reconstructing 3D structures as well as 2D images and videos from multimodal signals.

Robust 3D Human Pose Forecasting (📆 October ’21 - December '21)

Supervisor: Prof. Abhinav Shrivastava (UMD), Dr. Max Ehrlich (Nvidia AI; UMD)

Keywords: Adversarial Learning, 3D Pose Estimation and Forecasting, Domain Generalization

We introduce an Adversarial Learning based framework that can improve the generalization capability and performance of existing state-of-the-art methods for 3D human-pose forecasting. We started this project as part of a graduate-level course by Prof. Shrivastava and received a 100% score for this project. This led to a first-authored technical report [technical report] [github] [video demonstration].