Data-Driven Markov Decision Processes with Heterogeneous Groups: Regret Guarantees and Fairness Improvement via Robust Optimization, with Sarah Yini Gao and Zhichao Zheng
Presented at 2026 POMS-HK International Conference
Presented at 2025 INFORMS International Meeting
Presented at 2025 Robust Optimization Methods and Applications Workshop (Shenzhen)
Abstract
Healthcare datasets are often imbalanced and underrepresent certain demographic subgroups, such as those defined by ethnicity, gender, or region. This phenomenon, often described as data poverty, poses challenges for designing fair and effective policies, particularly in dynamic decision-making settings like patient interventions or resource allocation. We study this problem under a finite-horizon Markov Decision Process (MDP) framework where population groups differ in both transition dynamics and reward functions. We first establish regret bounds and show that transition uncertainty dominates the performance gap for minority-group policies learned from limited data. To mitigate the impact of data poverty, we examine the effectiveness of pooling data across groups.
While naïve pooling can exacerbate bias, we show that combining data pooling with distributionally robust optimization, using a type-1 Wasserstein ambiguity set, can yield substantial fairness and performance gains. In particular, we find that moderate robustness levels can reduce regret for minority groups without significantly degrading majority outcomes, protecting collective performance. More importantly, we show that robustness amplifies the fairness benefits of data pooling and achieves its maximum effect at certain ratios of group data sizes, effectively acting as a fairness regularizer. Numerical experiments on an ICU extubation task validate our theoretical findings, demonstrating that robust pooling improves both overall outcomes and fairness.
Distributionally Robust Risk-Averse Multi-Item Newsvendor Problems: A Lifting-and-Splitting Framework, with Sarah Yini Gao, Lianmin Zhang and Zhichao Zheng
Abstract
Distributionally robust multi-item newsvendor problems with mean–covariance demand ambiguity constitute a fundamental model for inventory decision making under uncertainty. However, the resulting problem is NP-hard and becomes computationally intractable for large instances, while existing approximation methods often lack provable performance guarantees.
In this paper, we develop a lifting-and-splitting framework under a conditional value-at-risk (CVaR) objective that combines exact reformulation, tractable approximation, and theoretical guarantees. We first lift the problem to a semidefinite program (SDP) over the Constrained Boolean Quadric Polytope, thereby replacing exponentially many positive semidefinite constraints with linear ones. Building on this reformulation, we introduce a correlation splitting approximation that partitions items into clusters and yields a polynomial-time solvable SDP relaxation.
We establish exactness of the approximation under identifiable structural conditions, and derive an explicit upper bound on the optimality gap under general problem structures. Moreover, in the two-cluster setting, we show that both the optimality gap and its theoretical upper bound vanish at rate O(σ) when the variances in one cluster are uniformly bounded by σ^2. This result reveals that approximation quality is fundamentally governed by variance heterogeneity and motivates a simple variance-based splitting rule for practical cluster assignment. Numerical experiments demonstrate that the proposed framework achieves strong computational tractability, high solution quality, and favorable scalability relative to benchmark methods.