Simulation Optimization, Sequential Learning
Ranking and Selection (R&S), also known as Pure Exploration and Best Arm Identification
Bayesian Learning
Generative AI, Large Language Models
Applications to Healthcare Management
Healthcare Management
Improving the Health Innovation Pipeline
Adaptive Clinical Trial Design, Precision Medicine
Value-based Clinical Trials
Target: the best, a good one, the optimal subset, a good set, ranking, etc.
Evaluator: large language models, stochasic simulation, input uncertainty in simulation, etc.
Learning Algorithm: greedy, UCB, expected value of information, etc.
Formulation: fixed-budget, multi-phase, portfolio, etc.
Economics: value-based clinical trials
Auxiliary Information: Bayesian priors
TOPIC 1: Improving the Health Innovation Pipeline for Childhood Cancer
Zaile Li, Stephen E. Chick, Sam Daems, and Shane G. Henderson (2025). Portfolios of Biomedical Innovations: Do Response Adaptiveness Across Clinical Trials and Value-Based Pricing Matter?. Working Paper
Healthcare technologies must pass several risky and costly hurdles, such as multiple phases of clinical trials, before reaching market access and reimbursement, making innovation a high-stakes gamble. The challenge is even greater for rare diseases such as pediatric oncology, where limited patient populations and thus the investment returns of innovation projects create a funding gap, and effective new treatments remain scarce. To address this challenge, a portfolio approach—developing multiple projects in parallel—is expected to improve the chance of success and better balance the risk–return profile. Developing levers to improve the outcomes and economic viability of such portfolios is an ongoing interest. In this work, we explore two such levers. The first is response-adaptive portfolio management, which adaptively allocates the available budget across projects as clinical trial data accumulates. The second is value-based pricing, which links post-approval reimbursement to the health value a new technology delivers. We develop a parsimonious model of a portfolio of healthcare innovation projects facing clinical trials, adopting a value-based perspective and incorporating common observation delays. Using ideas from Bayesian sequential optimization, we propose expected value of information heuristics for optimal portfolio management. Numerical studies calibrated to pediatric oncology show that response-adaptiveness improves both portfolio value and health outcomes, especially when the budget is limited relative to the number of projects in the portfolio, which is often the case in rare-disease contexts. Mild observation delays have little effect, while large delays diminish the benefit. Value-based pricing also plays a central role, with appropriate base pricing strongly shaping portfolio outcomes.
Zaile Li, Stephen E. Chick, Sam Daems, and Shane G. Henderson (2025). Explore Then Confirm: Investment Portfolios for New Drug Therapies. Accepted to 2025 Winter Simulation Conference
New medical technologies must pass several risky hurdles, such as multiple phases of clinical trials, before market access and reimbursement. A portfolio of technologies pools these risks, reducing the collective financial risk of such development while also improving the chances of identifying a successful technology. We propose a stylized model of a portfolio of technologies, each of which must pass two phases of clinical trials before market access is possible. Using ideas from Bayesian sequential optimization, we study the value of running response-adaptive clinical trials to flexibly allocate resources across technologies in a portfolio. We suggest heuristics for the response-adaptive policy and find evidence for their value relative to non-adaptive policies.
TOPIC 2: Selection Among Many Alternatives
Zaile Li, Weiwei Fan, and L. Jeff Hong (2025). The (Surprising) Sample Optimality of Greedy Procedures for Large-Scale Ranking and Selection. Management Science 71(2):1238-1259
🏆 Finalist, 2024 George Nicholson Student Paper Competition, INFORMS
🏆 First Prize, Best Student Paper Competition, the 14th POMS-HK International Conference
Recently, considerable attention in R&S has turned towards large-scale problems that involve a large number of alternatives. Ideal large-scale R&S procedures (algorithms) should be sample optimal, i.e., the total sample size required to deliver an asymptotically non-zero probability of correct selection grows at the minimal order (linear order) in the number of alternatives, k. Surprisingly, we discover that the naïve greedy algorithm, which keeps sampling the alternative with the largest running average, performs strikingly well and is sample optimal ...
Zaile Li, Weiwei Fan, and L. Jeff Hong (2025). UCB for Large-Scale Pure Exploration: Beyond Sub-Gaussianity. Under Review at Operations Research
Traditional approaches to pure exploration have predominantly relied on Gaussian or sub-Gaussian assumptions on the performance distributions of all alternatives, which limit their applicability to non-sub-Gaussian—especially heavy-tailed—problems. The need to move beyond sub-Gaussianity may become even more critical in large-scale problems, which tend to be especially sensitive to distributional specifications. In this paper, motivated by the widespread use of upper confidence bound (UCB) algorithms in pure exploration and beyond, we investigate their performance in the large-scale, non-sub-Gaussian settings. We consider the simplest category of UCB algorithms, where the UCB value for each alternative is defined as the sample mean plus an exploration bonus that depends only on its own sample size. We abstract this into a meta-UCB algorithm and propose letting it select the alternative with the largest sample size as the best upon stopping. For this meta-UCB algorithm, we first derive a distribution-free lower bound on the probability of correct selection. Building on this bound, we analyze two general non-sub-Gaussian scenarios: (1) all alternatives follow a common location-scale structure and have bounded variance; and (2) when such a structure does not hold, each alternative has a bounded absolute moment of order q > 3. In both settings, we show that the meta-UCB algorithm—and therefore a broad class of UCB algorithms—can achieve the sample optimality. These results demonstrate the applicability of UCB algorithms for solving large-scale pure exploration problems with non-sub-Gaussian distributions. Numerical experiments support our results and provide additional insights into the comparative behaviors of UCB algorithms within and beyond our meta-UCB framework.
Zaile Li, Weiwei Fan, and L. Jeff Hong (2025). Efficient Budget Allocation for Large-Scale Virtual Screening. To Be Submitted to INFORMS Journal on Computing
TOPIC 3: Simulation Optimization Under Input Uncertainty
Zaile Li, Yuchen Wan, and L. Jeff Hong (2025). Additive Distributionally Robust Ranking and Selection. Under Review at Operations Research
The practical value of R&S depends on accurate simulation input modeling, which often suffers from the curse of input uncertainty due to limited data. Distributionally robust ranking and selection (DRR&S) addresses this challenge by modeling input uncertainty via an ambiguity set of m > 1 plausible input distributions, resulting in km scenarios in total. Recent DRR&S studies suggest a key structural insight: additivity in budget allocation is essential for efficiency. However, existing justifications are heuristic, and fundamental properties such as consistency and the precise allocation pattern induced by additivity remain poorly understood. In this paper, we propose a simple additive allocation (AA) procedure that aims to exclusively sample the k + m - 1 previously hypothesized critical scenarios. Leveraging boundary-crossing arguments, we establish a lower bound on the probability of correct selection and characterize the procedure’s budget allocation behavior. We then prove that AA is consistent and, surprisingly, achieves additivity in the strongest sense: as the total budget increases, only k + m - 1 scenarios are sampled infinitely often. Notably, the worst-case scenarios of non-best alternatives may not be among them, challenging prior beliefs about their criticality. These results offer new and counterintuitive insights into the additive structure of DRR&S. To improve practical performance while preserving this structure, we introduce a general additive allocation (GAA) framework that flexibly incorporates sampling rules from traditional R&S procedures in a modular fashion. We also prove the consistency and additivity of GAA procedures. Numerical experiments support our theoretical findings and demonstrate the competitive performance of the proposed GAA procedures.
Yuchen Wan, Zaile Li*, and L. Jeff Hong (2025). New Additive OCBA Procedures for Robust Ranking and Selection. Asia-Pacific Journal of Operational Research 42(06):2540003
Distributional robust ranking and selection (DRR&S) is an important and challenging variation of conventional R&S that seeks to select the best alternative among a finite set of alternatives. It captures the common input uncertainty in the simulation model by using an ambiguity set to include multiple possible input distributions and shifts to select the best alternative with the smallest worst-case mean performance over the ambiguity set. In this project, we aim to develop new fixed-budget DRR&S algorithms to minimize the probability of incorrect selection under a limited sampling budget.
TOPIC 4: Diverse Selection Objectives
Yuchen Wan, Zaile Li, and L. Jeff Hong. Seeking the Best Extreme. In Progress
In traditional selection problems, the goal is to identify the alternative with the best mean performance. Motivated by drug discovery applications, this project instead focuses on selecting the alternative with the largest upper bound. We derive a closed-form expression for the probability of correct selection under this objective and develop efficient sampling algorithms based on Bayesian learning.