Machine learning & deep learning

Enhance Knowledge Graph Embedding by Mixup

This project addresses the challenge of effectively leveraging relational information in knowledge graphs, which are increasingly used to represent product relationships in e-commerce and other digital platforms. The abstract nature of nodes and edges in these graphs makes them difficult to model using conventional embedding methods. To overcome this, we proposed a triplet mixup technique that enhances the informativeness of knowledge graph embeddings, guided by influence functions to identify impactful relational patterns. Published in IEEE TKDE, 2023.

Accuracy, Fairness, Diversity All at Once: An Influence Function Guided Data Enhancement Approach for Recommender System

This project addresses the joint optimization of accuracy, fairness, and diversity in recommender systems, which are central to e-commerce and other digital platforms. Although prior work has explored fairness or diversity independently, few approaches attempt to balance all three objectives simultaneously—despite their importance in real-world applications. To fill this gap, I developed a data-driven enhancement framework guided by influence functions, designed to support the simultaneous optimization of accuracy, fairness, and recommendation diversity. ACM Transactions on Knowledge Discovery from Data (TKDD), forthcoming.

Fairness in Survival Analysis: A Novel Conditional Mutual Information Augmentation Approach

This project tackles demographic disparity in survival analysis, a widely used ML technique for estimating event occurrence risk in domains such as healthcare, finance, and criminal justice. While fairness-aware methods have been developed for classification and regression tasks, they cannot be directly applied to survival analysis due to its unique time-to-event structure. Existing fairness approaches for survival analysis often rely on impractical or overly simplistic definitions, limiting their real-world applicability. In this study, I propose a new fairness measure that better reflects practical needs and introduces a statistically rigorous method for mitigating disparity in survival predictions. Under 1st-round major revision at MIS Quarterly (MISQ).

High-dimensional counterfactual analysis

Cryptocurrency Airdrop Success Blueprint: A High-dimensional Causal Study Using Double Machine Learning

This study examines the cryptocurrency and blockchain domain, a rapidly evolving segment of financial technologies (FinTech) characterized by fully digital and highly complex design structures. Evaluating interventions in this context is challenging, yet ML-based counterfactual frameworks are particularly well-suited. In this study, I investigate airdrop campaigns—a common marketing mechanism used to promote new tokens—and develop a robust framework leveraging double machine learning (DML) to estimate the causal effect of airdrops on post-listing financial performance. To capture token characteristics, I employ large language models (LLMs) to extract functional insights from token whitepapers and incorporate them as control variables in the causal estimation. This work contributes to FinTech research by demonstrating how high-dimensional causal inference can rigorously evaluate token design strategies and provides practical design guidance for token developers. Under 1st-round major revision at INFORMS Information Systems Research (ISR).

Airbnb Cover Image Selection through High-Dimensional Visual Causal Inference

This study explores how Airbnb listing cover images affect booking rates and develops a framework to provide data-driven advisory for selecting optimal cover images. We propose a transformer-based information processing module that supports both propensity score estimation and heterogeneous treatment effect modeling. To address generalizability across geographic markets, we enhance the model with meta-learning techniques, allowing it to adapt to diverse environments. To tackle the challenge of interpretability, we integrate Contrastive Language-Image Pre-training (CLIP) embeddings with Compositional Soft Prompting (CSP) to identify visual attributes driving heterogeneous effects. This framework enables rigorous model selection and evaluation, and its findings offer actionable recommendations for hosts to optimize booking performance. Working in-progress.

Page updated

Google Sites

Report abuse