Data-Driven Markov Decision Processes with Heterogeneous Groups: Regret Guarantees and Fairness Improvement via Robust Optimization, with Sarah Yini Gao and Zhichao Zheng
Presented at 2026 POMS-HK International Conference
Presented at 2025 INFORMS International Meeting
Presented at 2025 Robust Optimization Methods and Applications Workshop (Shenzhen)
Abstract
Healthcare datasets are often imbalanced and underrepresent certain demographic subgroups, such as those defined by ethnicity, gender, or region. This phenomenon, often described as data poverty, poses challenges for designing fair and effective policies, particularly in dynamic decision-making settings like patient interventions or resource allocation. We study this problem under a finite-horizon Markov Decision Process (MDP) framework where population groups differ in both transition dynamics and reward functions. We first establish regret bounds and show that transition uncertainty dominates the performance gap for minority-group policies learned from limited data. To mitigate the impact of data poverty, we examine the effectiveness of pooling data across groups.
While naïve pooling can exacerbate bias, we show that combining data pooling with distributionally robust optimization, using a type-1 Wasserstein ambiguity set, can yield substantial fairness and performance gains. In particular, we find that moderate robustness levels can reduce regret for minority groups without significantly degrading majority outcomes, protecting collective performance. More importantly, we show that robustness amplifies the fairness benefits of data pooling and achieves its maximum effect at certain ratios of group data sizes, effectively acting as a fairness regularizer. Numerical experiments on an ICU extubation task validate our theoretical findings, demonstrating that robust pooling improves both overall outcomes and fairness.
A Lifting-and-Splitting Framework for Risk-Averse Distributionally Robust Multi-Item Newsvendor Problems, with Sarah Yini Gao, Lianmin Zhang and Zhichao Zheng
Abstract
Risk-averse distributionally robust multi-item newsvendor problems provide a fundamental model for inventory decision making under demand uncertainty, limited distributional information, and downside-risk concerns. In this paper, we study this problem under mean--covariance demand ambiguity, where only the first two moments of demand are assumed to be known. When downside risk is evaluated through conditional value-at-risk, preserving the full covariance structure leads to an NP-hard problem that becomes computationally intractable for large assortments. In addition, cross-item correlations are often difficult to estimate reliably in high-dimensional settings, motivating approximations that retain salient dependence information while simplifying less reliable parts of the covariance structure.
We develop a lifting-and-splitting framework that combines exact reformulation, scalable approximation, and theoretical error analysis. We first derive an exact semidefinite programming reformulation over the constrained Boolean quadric polytope, which replaces exponentially many positive semidefinite constraints in the classical dual formulation with linear constraints. Building on this reformulation, we propose a correlation-splitting approximation that partitions items into clusters, preserves within-cluster dependence, and relaxes cross-cluster dependence. We establish exactness conditions and derive an explicit upper bound on the optimality gap, with a cluster-wise decomposition that explains how approximation error arises from discarded cross-cluster correlations. In the two-cluster setting, we further show that the approximation becomes increasingly accurate when one cluster has uniformly small demand variability, which identifies variance heterogeneity as a key driver of approximation quality and motivates a simple variance-based splitting rule. Numerical experiments show that the exact reformulation solves substantially larger instances than the classical semidefinite formulation, while the correlation-splitting approximation consistently delivers higher solution quality than benchmark approximations and solves large instances in a fraction of the time.