Data-Driven Markov Decision Processes with Heterogeneous Groups: Regret Guarantees and Fairness Improvement via Robust Optimization, with Sarah Yini Gao and Zhichao Zheng
Presented at 2026 POMS-HK International Conference
Presented at 2025 INFORMS International Meeting
Presented at 2025 Robust Optimization Methods and Applications Workshop (Shenzhen)
Abstract
Healthcare datasets are often imbalanced and underrepresent certain demographic subgroups, such as those defined by ethnicity, gender, or region. This phenomenon, often described as data poverty, poses challenges for designing fair and effective policies, particularly in dynamic decision-making settings like patient interventions or resource allocation. We study this problem under a finite-horizon Markov Decision Process (MDP) framework where population groups differ in both transition dynamics and reward functions. We first establish regret bounds and show that transition uncertainty dominates the performance gap for minority-group policies learned from limited data. To mitigate the impact of data poverty, we examine the effectiveness of pooling data across groups.
While naïve pooling can exacerbate bias, we show that combining data pooling with distributionally robust optimization, using a type-1 Wasserstein ambiguity set, can yield substantial fairness and performance gains. In particular, we find that moderate robustness levels can reduce regret for minority groups without significantly degrading majority outcomes, protecting collective performance. More importantly, we show that robustness amplifies the fairness benefits of data pooling and achieves its maximum effect at certain ratios of group data sizes, effectively acting as a fairness regularizer. Numerical experiments on an ICU extubation task validate our theoretical findings, demonstrating that robust pooling improves both overall outcomes and fairness.
Distributionally Robust Multi-item Newsvendor Problem: A Lifting-and-Splitting Approach, with Sarah Yini Gao, Lianmin Zhang and Zhichao Zheng
Abstract
We propose a lifting-and-splitting framework for the distributionally robust multi-item newsvendor problem with mean-covariance demand ambiguity and a conditional value-at-risk (CVaR) objective. We first lift the decision space and reformulate the problem as a semidefinite program (SDP) over the Constrained Boolean Quadric Polytope, reducing exponentially many positive semidefinite constraints to linear ones. To ensure tractability, we introduce a correlation splitting approximation that yields a polynomial-time solvable SDP lower bound. We establish a theoretical guarantee for the framework and show its tightness in a two-item setting with arbitrary correlation. Computational experiments demonstrate substantial improvements in both tractability and out-of-sample performance compared to several benchmarks.