Research

Research Interests

My research is focused on a wide range of areas in applied mathematics including numerical analysis, mathematical/computational biology, machine learning, and drug design. My current areas of interest are as follows:

Mathematical/Computational Biology

Mathematical models for biosciences
Differential geometry representations of biomolecular data
Molecular surface representation
Graph theory for biomolecules
Mathematical models for aquatic ecosystems
Spatio-temporal models incorporating ecological stoichiometry
Reaction-diffusion partial differential equations for aquatic ecosystems

Machine Learning for Drug Design

Gradient boosting trees (GBT)
Convolutional neural network (CNN)
Graph convolutional neural network (GNN)

Numerical Analysis

Numerical methods for partial differential equations (PDEs)
Implicit Runge-Kutta methods for time-dependent PDEs
Preconditioning for large-scale linear systems
Reduced-order model of nonlinear PDEs and its applications to biological systems

Geometric Graph Learning For Biomolecule

In biomolecular studies, graph theory has been widely applied since graphs can be used to model molecules or molecular complexes in a natural manner. We upgrade the graph-based learners for the study of protein-ligand interactions by integrating extensive atom types such as SYBYL and extended connectivity interactive features (ECIF) into multiscale weighted colored graphs (MWCG).

The molecular complex with PDBID: 5bwc is used as an example to illustrate the geometric graph learning approach, as shown in the first column. In the second column, specific protein-ligand atom-type pairs (CA-O.3, OE1-N.pl3, and NE1-C.2) are depicted from top to bottom. The third column shows the corresponding weighted colored geometric subgraphs. In the fourth column, statistical information about the rigidity of the subgraphs is presented. Finally, the fifth column demonstrates the integration of these statistical features through advanced machine learning models, such as gradient boosting trees, for training and prediction purposes.

EISA-Score: Element Interactive Surface Area Score

Molecular surface representations have been used as a great tool to study protein structure and functions, including protein-ligand binding affinity prediction. In this project, we present molecular surface representations embedded in different scales of the element interactive manifolds featuring the dramatically dimensional reduction and accurately physical and biological properties encoders.

We propose to construct the molecular surface at the pairwise element levels. The element-wise surfaces effectively capture some specific types of non-covalent interactions, such as Van der Waals interactions, hydrophobicity, and hydrogen bonds. In this work, we are interested in constructing a class of surfaces at the multiscale levels by varying the suitable kernel parameters and level set values via the multiscale discrete-to-continuum mapping.

In this project, we define two types of element interactive surfaces.

Local surface: formed by a single atom with a given element type.

2. Global surface: formed by element interactive density with a restraint on the element interactive region.

Machine Learning Strategy with EISA

The descriptors of the element interactive surface area (EISA) for a molecule or molecular complex provide robustness and scalable features for machine learning or deep learning-based models to learn the diverse biomolecular datasets. The EISA representations are ready to be integrated with a wide variety of machine learning algorithms such as support vector machine, random forest, gradient boosting trees, artificial neural networks, and convolutional neural networks.

Preconditioning IRK methods for parabolic PDEs

Implicit Runge-Kutta (IRK) methods offer an appealing combination of stability and high order; however, these methods are not widely used for the solution of partial differential equations (PDEs) because they lead to large, strongly coupled linear systems. An s-stage IRK system has s times as many degrees of freedom as the systems resulting from backward Euler or implicit trapezoidal rule discretization applied to the same equation set.

In this project, we introduce a new block preconditioner for IRK methods, based on an LDU factorization. Solves on individual blocks are accomplished using a multigrid algorithm. We demonstrate the effectiveness of this preconditioner on two test problems. The first is a simple heat equation, and the second is a model advection-diffusion problem known as the double-glazing problem. We find that our preconditioner is scalable (independent of mesh size and time-step) and yields better timing results than other preconditioners currently in the literature: block Jacobi and block Gauss-Seidel.

Our preconditioner is also robust with respect to varying time step sizes for a fixed spatial resolution. We ran experiments with IRK stages up to 7 and have found that the new preconditioner outperforms the others, with the improvement becoming more pronounced as spatial discretization is refined and as temporal order is increased.

Figure: Eigenvalue distributions of the original and preconditioned matrices for the two-dimensional double-glazing problem using the 5th-stage Radau IIA method. The eigenvalues of the original discretization matrix are in blue, greens are of the preconditioned matrix with Gauss-Seidel preconditioner, and reds are of the preconditioned matrix with LDU-based lower triangular preconditioner. We observe that the LDU-based lower triangular preconditioner (in red) clusters the eigenvalues of the original matrix away from zero and near the unity.

Research

Research Interests

Mathematical models for biosciences

Mathematical models for aquatic ecosystems