Data-Driven Markov Decision Processes with Heterogeneous Groups: Regret Guarantees and Fairness Improvement via Robust Optimization, with Sarah Yini Gao and Zhichao Zheng
Presented at 2026 POMS-HK International Conference
Presented at 2025 INFORMS International Meeting
Presented at 2025 Robust Optimization Methods and Applications Workshop (Shenzhen)
Abstract
Healthcare datasets are often imbalanced and underrepresent certain demographic subgroups, such as those defined by ethnicity, gender, or region. This phenomenon, often described as data poverty, poses challenges for designing fair and effective policies, particularly in dynamic decision-making settings like patient interventions or resource allocation. We study this problem under a finite-horizon Markov Decision Process (MDP) framework where population groups differ in both transition dynamics and reward functions. We first establish regret bounds and show that transition uncertainty dominates the performance gap for minority-group policies learned from limited data. To mitigate the impact of data poverty, we examine the effectiveness of pooling data across groups.
While naïve pooling can exacerbate bias, we show that combining data pooling with distributionally robust optimization, using a type-1 Wasserstein ambiguity set, can yield substantial fairness and performance gains. In particular, we find that moderate robustness levels can reduce regret for minority groups without significantly degrading majority outcomes, protecting collective performance. More importantly, we show that robustness amplifies the fairness benefits of data pooling and achieves its maximum effect at certain ratios of group data sizes, effectively acting as a fairness regularizer. Numerical experiments on an ICU extubation task validate our theoretical findings, demonstrating that robust pooling improves both overall outcomes and fairness.
Distributionally Robust Multi-item Newsvendor Problem: A Lifting-and-Splitting Approach, with Sarah Yini Gao, Lianmin Zhang and Zhichao Zheng
Abstract
Distributionally robust multi-item newsvendor problem with mean–covariance demand ambiguity constitutes a fundamental model for inventory decision making under demand uncertainty. However, the resulting optimization problem is NP-hard and thus intratable for large instances. Moreover, existing approximation methods typically lack tractability guarantees with provable performance bounds.
In this paper, we propose a lifting-and-splitting framework for this problem under a conditional value-at-risk (CVaR) objective, which admits both polynomial-time solvability and theoretical performance guarantees. We first lift the decision space and reformulate the problem as a semidefinite program (SDP) over the Constrained Boolean Quadric Polytope, thereby transforming exponentially many positive semidefinite constraints to linear ones. To further enhance tractability, we introduce a correlation splitting approximation that yields a polynomial-time solvable SDP lower bound. We then establish a theoretical performance guarantee for the proposed framework. Moreover, in a general multi-item setting, we show that both the true and theoretical optimality gaps vanish when the variances of partitioned item subsets are sufficiently heterogeneous. Numerical experiments demonstrate superior performance in computaitonal tractability, solution quality and scalability compared to several benchmark methods.