Current projects (ML/DL, Computational Biology, Biostatistics)
In terms of machine learning and deep learning model development, bioinformatics, computational biology, and transcriptomics data analysis, I have played crucial roles in multiple projects, including:
Objectives
To address the primary limitation of the Xenium platform, which can only sequence a limited number of genes (~500), restricting its capacity for exploratory analysis.
To develop a novel gene imputation method that corrects a key deficiency in existing approaches by incorporating the rich cellular morphology and location information available in Xenium's pathology images.
To create a multimodal AI framework that utilizes both spatial gene expression profiles and their corresponding pathology images to enhance gene imputation accuracy.
Results
Successfully created and implemented xMINT, a novel, small, and efficient transformer-based model for gene imputation.
Demonstrated that xMINT has superior imputation accuracy when compared to existing, competing methods.
Effectively enhanced the analytical power of Xenium data by enabling more robust and accurate exploratory analysis through high-quality gene imputation.
(Note: This work was published as a conference paper in ICML: https://icml.cc/virtual/2024/35914)
Objectives
Develop a pathway analysis approach that better reflects the structural organization of molecular interactions.
Improve sensitivity and robustness in detecting pathway activation across diverse single-cell and tissue contexts.
Enable biologically interpretable insights linking cellular states to underlying molecular mechanisms.
Results
Topology-aware modeling: Integrates graph-based features of gene co-expression networks into pathway activity inference.
Broad applicability: Compatible with single-cell and spatial transcriptomics data across multiple tissues.
Robust and interpretable: Provides stable pathway signatures that are resilient to batch effects and technical noise.
Objectives
Profile senescence-associated programs in fibro-adipogenic progenitors (FAPs) from normal and CLTI-affected skeletal muscle.
Compare senescence pathway activities between distal and proximal regions to assess spatial heterogeneity.
Examine age- and disease-related trajectories of cellular senescence using pathway-level inference.
Results
CLTI samples showed elevated senescence signatures in distal muscle FAPs, indicating vascular-stress–induced aging.
Age-dependent activation of oxidative stress, DNA damage, and SASP-related pathways revealed progressive senescence dynamics.
Uncovered robust, interpretable pathway signatures distinguishing healthy, aging, and ischemic muscle states.
Objectives
Develop a spatially resolved molecular signature framework (xSCOPE) integrating cancer and immune-related pathways to characterize tumor microenvironments (TMEs).
Systematically quantify and visualize key functional signatures (e.g., TLS, immunosuppressive, activated CD8 T) across spatial domains and tissue regions.
Understand molecular signatures with histological features (H&E/DAPI) to uncover spatial gradients and microenvironmental organization within tumors.
Results
Identified distinct spatial patterns of immune and cancer pathway activity across tumor, stromal, and boundary regions.
Revealed spatial correlations between TLS and immunosuppressive signatures, suggesting context-dependent immune balance in the TME.
Demonstrated that xSCOPE enables integrative mapping of molecular and histological features, enhancing interpretation of cancer spatial transcriptomics data.
Past projects (ML/DL, Large-Scale Systems, Imaging)
In terms of data analysis and software development, I have played crucial roles in multiple projects, including:
The project is centered on detecting anomalies and predicting failures in the big data platform. It is a project that collaborated with a top communications carrier.
Employ the regular expressions to remove variables from the log data and then perform log parsing.
Use the bag-of-words model to vectorize logs to obtain structured features.
Devise a clustering algorithm to automatically extract log templates from nearly 50 million logs.
Mine relations among anomalies through association rule mining and store them in the anomaly knowledge base.
Build an anomaly prediction model based on a neural network to locate failures and provide feedback to operators in time.
The project concentrates on anomaly detection within the industrial Internet scenario, particularly for IoT device data. It is a basic scientific research project.
Perform data collection and cleaning for the downstream tasks.
Build a knowledge graph from the system anomalies and relevant solutions.
Improve the anomaly detection performance through knowledge graph embedding to involve features.
The project is focused on creating an automatic tool for the analysis of log data, time series data, and trace data. It is an industry-university collaborative research innovation fund project.
Analyze the log data with the open-source platform to obtain structured log data.
Devise an end-to-end neural network model for feature extraction and anomaly detection of sequential logs.
Integrate the log processing and anomaly detection model to form software products.
Develop a robust multi-organ segmentation method for 3D thoracic CT images (esophagus, heart, trachea, aorta).
Simplify and adapt existing Dense V-net architectures to perform well given relatively small training datasets and reduce segmentation artifacts (fragmentation, disconnected components).
Introduce postprocessing steps and loss functions tailored to medical segmentation challenges (class imbalance, spatial coherence) to improve both accuracy and structural consistency.
Proposed a simplified Dense V-net architecture (smaller input size, removal of spatial prior block) to reduce overfitting and computational complexity.
Employed data augmentation, balanced patch sampling, and multi-scale prediction to counter class imbalance in organs of different sizes.
Designed a postprocessing denoising step (axis-wise connected-component removal) to eliminate small, fragmented predictions.
Used a Dice + Cross Entropy loss (DicePlusXEnt) weighted toward imbalanced organs to stabilize training.
Coupled model fusion (ensemble of several trained models) and 3D denoising to further refine segmentation masks.
(Note: This work was published as a conference paper in ISBI)