Data-Driven Markov Decision Processes with Heterogeneous Groups: Regret Guarantees and Fairness Improvement via Robust Optimization, with Sarah Yini Gao and Zhichao Zheng
Presented at 2026 POMS-HK International Conference
Presented at 2025 INFORMS International Meeting
Presented at 2025 Robust Optimization Methods and Applications Workshop (Shenzhen)
Abstract
Healthcare datasets are often imbalanced and underrepresent certain demographic subgroups, such as those defined by ethnicity, gender, or region. This phenomenon, often described as data poverty, poses challenges for designing fair and effective policies, particularly in dynamic decision-making settings like patient interventions or resource allocation. We study this problem under a finite-horizon Markov Decision Process (MDP) framework where population groups differ in both transition dynamics and reward functions. We first establish regret bounds and show that transition uncertainty dominates the performance gap for minority-group policies learned from limited data. To mitigate the impact of data poverty, we examine the effectiveness of pooling data across groups.
While naïve pooling can exacerbate bias, we show that combining data pooling with distributionally robust optimization, using a type-1 Wasserstein ambiguity set, can yield substantial fairness and performance gains. In particular, we find that moderate robustness levels can reduce regret for minority groups without significantly degrading majority outcomes, protecting collective performance. More importantly, we show that robustness amplifies the fairness benefits of data pooling and achieves its maximum effect at certain ratios of group data sizes, effectively acting as a fairness regularizer. Numerical experiments on an ICU extubation task validate our theoretical findings, demonstrating that robust pooling improves both overall outcomes and fairness.
A Lifting-and-Splitting Framework for Risk-Averse Distributionally Robust Multi-Item Newsvendor Problems, Under Review, with Sarah Yini Gao, Lianmin Zhang and Zhichao Zheng
Abstract
Risk-averse distributionally robust multi-item newsvendor problems provide a fundamental model for inventory decisions under demand uncertainty, limited distributional information, and downside-risk concerns. We study this problem under mean--covariance demand ambiguity, where the decision maker maximizes the worst-case conditional value-at-risk of profit. While cross-item demand correlations are important for portfolio-level inventory decisions, they are difficult to estimate reliably in high dimensions. Moreover, preserving the full covariance information leads to an NP-hard formulation that is computationally challenging for large assortments.
We develop a lifting-and-splitting framework that combines exact reformulation, scalable approximation, and performance guarantees. First, we derive an exact semidefinite programming reformulation based on the constrained Boolean quadric polytope, replacing exponentially many positive semidefinite constraints in the classical dual formulation with exponentially many linear constraints. Second, we propose a correlation-splitting approximation that partitions items into clusters, preserves within-cluster covariance information, and relaxes cross-cluster dependence. We establish exactness conditions and derive an explicit cluster-wise upper bound on the optimality gap. We further show that, in the two-cluster case, the approximation becomes increasingly accurate when one cluster has uniformly small demand variability, motivating a simple variance-based splitting rule. Numerical experiments demonstrate improved exact-solution tractability, strong approximation quality, and scalability to large instances.