Evelyn Xiao-Yue Gong, Ph.D. Candidate, MIT Operations Research Center
Title: Bandits atop Reinforcement Learning: Tackling Online Inventory Models with Cyclic Demands
Abstract: Motivated by a long-standing gap between inventory theory and practice, we study online inventory models with unknown cyclic demand distributions. We design provably efficient reinforcement learning (RL) algorithms that leverage the structure of inventory problems. We apply the standard performance measure in online learning literature, regret, which is defined as the difference between the total expected cost of our policy and the total expected cost of the clairvoyant optimal policy that has full knowledge of the demand distributions a priori. This paper analyzes, in the presence of unknown cyclic demands, the lost-sales model with zero lead time, and the multi-product backlogging model with positive lead times, fixed joint-ordering costs and order limits. For both models, we first introduce episodic models where inventory is discarded at the end of every cycle, and then build upon these results to analyze non-discarding models. Our RL policies HQL and FQL achieve \tilde{\mathcal{O}}(\sqrt{T}) regret for the episodic lost-sales model and the episodic multi-product backlogging model, matching the regret lower bound that we prove in this paper. For the non-discarding models, we construct a bandit learning algorithm on top of the previous RL algorithms, named Meta-HQL. Meta-HQL achieves \tilde{\mathcal{O}}(\sqrt{T}) regret for the non-discarding lost-sales model with zero lead time, again matching the regret lower bound. For the non-discarding multi-product backlogging model, our policy Mimic-QL achieves \tilde{\mathcal{O}}(T^{5/6}) regret. Our policies remove the regret dependence on the cardinality of the state-action space for inventory problems, which is an improvement over existing RL algorithms. We conducted experiments with a real sales dataset from Rossmann, one of the largest drugstore chains in Europe, and also with a synthetic dataset. For both sets of experiments, our policy converges rapidly to the optimal policy and dramatically outperforms the best policy that models demand as i.i.d. instead of cyclic.
Bio: Evelyn Xiao-Yue Gong is a recent PhD, about to graduate from the Operations Research Center at MIT. Evelyn is advised by Professors David Simchi-Levi and Jim Orlin. Her main research interest is in developing provable artificial intelligence for supply chain and sustainability. She is also broadly interested in data-driven decision making for business operations in the online setting where information is limited and data can only be collected as decisions are made. She is a current recipient of the Accenture Fellowship, and has had multiple internships at Google, Microsoft Research and other firms. Prior to MIT, she studied mathematics at New York University (in New York and Shanghai) and double-majored in Interactive Media Arts.
Prof. Guihua Wang, Naveen Jindal School of Management, University of Texas at Dallas
Title: The Spillover Effect of Suspending Non-essential Surgery: Evidence from Kidney Transplantation
Abstract: The COVID-19 pandemic has posed an epic challenge to the U.S. healthcare industry. Between March and April 2020, multiple state governors issued orders to temporarily suspend non-essential surgical procedures. The suspensions caused the healthcare industry to shed millions of jobs, raising concerns about the availability of essential procedures. In this paper, we estimate the potential spillover effect of suspending non-essential surgery on patient access to essential health services, using deceased-donor kidney transplantation as the clinical setting. Through analyzing a dataset of all U.S. kidney transplantation procedures, we observe a steep reduction in the volume of deceased-donor kidney transplantation across nearly all states amid the initial months of the pandemic. However, states that suspended non-essential surgery experienced far steeper reductions than those without. Using a difference-in-differences approach, we estimate a state-level suspension of non-essential surgery led to a 23.6% reduction in the transplant volume. Our study reveals the spillover effect of state-level health policies on patient access to essential services such as deceased-donor kidney transplantation. Our mediation analysis shows 38.7% of the spillover effect can be attributable to the change in healthcare employment, indicating these suspensions caused hospitals to reduce the size of their workforces required for all procedures, which ultimately had a negative impact on access to essential procedures. Instead of suspending all non-essential surgery in the event of a future pandemic, policymakers should consider more granular approaches to safeguarding the healthcare workforce critical to supporting essential services.
Bio: Guihua is an Assistant Professor of Operations Management at the Naveen Jindal School of Management, University of Texas at Dallas. He obtained his PhD from the University of Michigan, MSc from the Georgia Institute of Technology, MSc and BEng from the National University of Singapore. Prior to his PhD study, Guihua worked as a supervisor of the industrial engineering department at UPS Asia headquartered in Singapore. Guihua’s research focuses on the intersection of empirical econometrics and machine learning with application to personalized healthcare. More specifically, Guihua has developed new causal machine learning techniques such as instrumental variable forest and first-difference causal forest for heterogeneous treatment effect analyses using observational healthcare data.
Guihua’s research has been published at Management Science, Manufacturing & Service Operations Management, Production and Operations Management, Advances in Applied Probability, Surgery, and Annals of Thoracic Surgery, and received media coverage by Associate Press, Crain’s Detroit, Houston Chronicle, Medical Xpress, National Interest, NPR, PRI, Science Daily, Simply Flying, The Conversation, and Yahoo! News. His research on organ transplant was named a runner-up for the Responsible Business Education Award by the Financial Times. He was the Winner of the INFORMS Health Application Society Student Paper Competition, two-time Finalists of the MSOM Student Paper Competition and the INFORMS Service Section Best Paper Competition.
Yunbei Xu, Ph.D. Candidate, Graduate School of Business at Columbia University
Title: Bayesian Design Principles for Frequentist Sequential Learning
Abstract: We propose a general framework to study frequentist sequential learning, where concepts from Bayesian inference are essential for algorithm design and regret analysis. We develop general algorithm design principles and study related complexity measures that apply in various bandit and reinforcement learning problems. In particular, we propose to design adaptive “algorithmic beliefs” instead of frequentist estimators; and make use of posterior updates instead of traditional frequentist decision rules. These posterior-based algorithms automatically adapt to non-stochastic environments, and their regret behavior can often be shown to be best possible. Moreover, the algorithms are simple and often efficient to implement. In particular, we derive a novel algorithm for multi-armed bandit that achieves the “best-of-all-worlds” empirical performance in the stochastic, adversarial, and non-stationary environments. We also provide some unification and improved guarantees in infinite-armed bandit and reinforcement learning.
Bio: Yunbei Xu is a fifth-year PhD student in the Decision, Risk, and Operations Division at Columbia University Graduate School of Business, where he works with Professor Assaf Zeevi. His research interests lie at the intersection of machine learning, operations research, and statistics. His research vision is to make fundamental contributions to the science of learning and decision making; and to identify new problems and algorithms for practitioners to make data-driven decisions and improve real-world systems. On the theory side, he has worked in statistical learning theory, bandit and reinforcement learning, causal inference, and optimization. On the application side, he has experience in designing and implementing new algorithms for dynamic pricing, personalized medicine, and medical imaging.
Prof. Zeyu Zheng, UC Berkeley
Title: Non-stationary A/B Tests: Optimal Variance Reduction, Bias Correction, and Valid Inference
Abstract: A/B tests have been used at scale by data-driven enterprises to test innovative ideas to improve core business metrics. Non-stationarities, such as the time-of-day effect and the day-of-week effect, can often arise nonparametrically in business metrics involving purchases, revenue, conversions, customer experiences, etc. We show that ignoring or inadequately addressing non-stationarities can cause standard A/B tests estimators to have sub-optimal variance and non-vanishing bias, therefore leading to loss of statistical efficiency and accuracy. We build a model to analyze non-stationary A/B tests. We provide new estimators, prove central limit theorems, and prove them achieving optimal variance and asymptotic zero bias. This is joint work with Yuhang Wu from UC Berkeley, and Guangyu Zhang, Zuohua Zhang, and Chu Wang from Amazon.
Bio: Zeyu Zheng is an Assistant Professor at UC Berkeley, the Department of Industrial Engineering and Operations Research. He received a PhD degree in Operations Research from Stanford University in 2018, an MS degree in Economics from Stanford University in 2016, and a Bachelor degree in Mathematics from Peking University in 2012. He has done research in Monte Carlo simulation theory and simulation optimization. He is also interested in non-stationary stochastic modeling.
Omar Mouchtaki, Ph.D. Candidate, Graduate School of Business at Columbia University
Title: Beyond IID: Data-Driven Decision-Making in Heterogeneous Environments
Abstract: How should one leverage historical data when past observations are not perfectly indicative of the future, e.g., due to the presence of unobserved confounders which one cannot "correct" for? Motivated by this question, we study a data-driven decision-making framework in which historical samples are generated from unknown and different distributions assumed to lie in a heterogeneity ball with known radius and centered around the (also) unknown future (out-of-sample) distribution on which the performance of a decision will be evaluated. This work aims at analyzing the performance of central data-driven policies but also near-optimal ones in these heterogeneous environments.
We first establish, for a general class of policies, a new connection between data-driven decision-making and distributionally robust optimization with a regret objective. We then leverage this connection to quantify the performance that is achievable by Sample Average Approximation (SAA) as a function of the radius of the heterogeneity ball: for any integral probability metric, we derive bounds depending on the approximation parameter, a notion which quantifies how the interplay between the heterogeneity and the problem structure impacts the performance of SAA. When SAA is not rate-optimal, we design and analyze problem-dependent policies achieving rate-optimality. We compare achievable guarantees for three widely-studied problems --newsvendor, pricing, and ski rental-- under heterogeneous environments. Our work shows that the type of achievable performance varies considerably across different combinations of problem classes and notions of heterogeneity.
Bio: Omar Mouchtaki is a fourth year PhD student in the Decision, Risk and Operations Division of the Graduate School of Business at Columbia University. He is advised by Professor Omar Besbes and Professor Will Ma. Prior to joining Columbia, he earned an MS in Applied Mathematics and Computer Science from Ecole Polytechnique.
He is broadly interested in developing methodological tools to bridge the gap between the theory and the practice of data-driven decision-making. His current research aims at characterizing the performance of algorithms in all data regime, small and large, with possible applications in inventory management and pricing problems.
Nian Si, Ph.D. Candidate at Stanford University
Title: Distributionally Robust Policy Learning
Abstract: Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data -- an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting dataset.
Bio: Nian Si is a final-year PhD student in the Department of Management Science and Engineering (MS&E) at Stanford University, advised by Professor Jose Blanchet. He recently joined the University of Chicago Booth School of Business as a postdoctoral principal researcher, working with Professor Baris Ata. He received a B.A. degree in Economics and a B.S. degree in Mathematics from Peking University in 2017. His research lies at the interface of applied probability and machine learning and he develops methods for trustworthy decision making. He is also interested in real-world problems arising from online platforms and some of his proposed solutions have been deployed in the industry.
Prof. Junyu Cao, McCombs School of Business, The University of Texas at Austin
Title: Cooperative Learning and Decision-Making: A Simple Framework
Abstract: We formulate a cooperative learning and decision-making problem with contextual information. Motivated by the business practice, the joint decision is often made by multiple teams in sequence. We propose a simple cooperation framework that integrates the learning into decision-making in an unknown environment. We theoretically show that two proposed algorithms achieve the optimal rate. Furthermore, we study the model misspecification issue. We test our algorithm using real data from a large E-commerce retailer. Numerical studies show that two proposed frameworks are effective in terms of both regret performance and computational efficiency.
Bio: Junyu Cao is an assistant professor at McCombs School of Business, The University of Texas at Austin. Her research interests include data-driven decision making and stochastic modeling, with applications to smart-city operations and revenue management.