Princeton University
Date: Jan 24, 2022
Classical matrix concentration inequalities give nonasymptotic bounds on the spectral norm of sums of independent random matrices. These bounds are extremely versatile and are widely used in applications. On the other hand, it is well known that these inequalities fail to yield sharp results for even the simplest random matrix models. In this talk, I will describe a powerful new class of matrix concentration inequalities that are able to capture the sharp behavior of the spectrum of many random matrix models. These inequalities arise from an unexpected phenomenon: the spectrum of random matrices is accurately captured by certain predictions of free probability theory under surprisingly minimal assumptions. I aim to explain what these new inequalities look like and how to use them.
The talk is based on joint works with Afonso Bandeira and March Boedihardjo, and with Tatiana Brailovskaya.
The University of California, Berkeley
Date: Jan 31, 2022
In large service systems, such as cloud computing systems, there are different classes of jobs and of servers such that each job class can only be done on a subset of the server classes, due to data locality and other constraints. Similarly, there are often compatibility constraints in dynamic matching systems such as platforms for car sharing and waitlists for organ transplants. For Markovian FCFS systems, the steady-state distributions have been shown to have a simple “product-form” structure. I will describe a unified framework for these models that provides a common simple proof for the product-form results at a detailed state level, and provides a simple, state-aggregated, view for analyzing waiting time distributions. I will also discuss recent results comparing collaborative (cancel-on-complete) and noncollaborative (cancel-on-start) protocols, and the impacts of flexibility.
Joint work with Ivo Adan, Igor Kleiner, Kristen Gardner, and Gideon Weiss
Columbia University
Date: Feb 07, 2022
The bootstrap is a versatile method for statistical uncertainty quantification, but when applied to large-scale or simulation-based models, it could face substantial computation demand from repeated data resampling and model refitting. We present a bootstrap methodology that uses minimal computation, namely with a resample effort as low as one Monte Carlo replication, while maintaining desirable statistical guarantees. We describe how this methodology can be used for fast inference across different estimation problems. We also discuss its particular relevance and generalizations to handling uncertainties in stochastic simulation analysis.
Northwestern University
Date: Feb 21, 2022
The empirical success of deep learning is often attributed to SGD’s mysterious ability to avoid sharp local minima in the loss landscape, as sharp minima are known to lead to poor generalization. Recently, empirical evidence of heavy-tailed gradient noise was reported in many deep learning tasks, and it was argued that SGD can escape sharp local minima under the presence of such heavy-tailed gradient noise, providing a partial explanation to the mystery. This talk analyzes a popular variant of SGD where gradients are truncated above a fixed threshold. We show that it achieves a stronger notion of avoiding sharp minima: it can effectively eliminate sharp local minima entirely from its training trajectory. Further, we rigorously characterize the first exit times from local minima and prove that under some structural conditions, the dynamics of heavy-tailed truncated SGD with small learning rates closely resemble those of a continuous-time Markov chain that never visits any sharp minima. Real data experiments on deep neural networks confirm our theoretical prediction that SGD with truncated heavy-tailed gradient noise finds flatter local minima and achieves better generalization.
This talk is based on the joint work with Xingyu Wang and Sewoong Oh.
Cornell University
Date: Feb 28, 2022
The practicality of reinforcement learning algorithms has been limited due to poor scaling with respect to the problem size, as the sample complexity of learning an epsilon-optimal policy scales with |S| times |A| over worst case instances of an MDP with state space S, action space A, and horizon H. A key question is when can we design probably efficient RL algorithms that exploit nice structure? We consider a class of MDPs that exhibit low rank structure, where the latent features are unknown. We argue that a natural combination of value iteration and low-rank matrix estimation results in an estimation error that grows doubly exponentially in the horizon length. We then provide a new algorithm along with statistical guarantees that efficiently exploits low rank structure given access to a generative model, achieving a sample complexity that scales linearly in |S|+|A| and polynomially in the horizon and the rank. In contrast to literature on linear and low-rank MDPs, we do not require a known feature mapping, our algorithm is computationally simple, and our results hold for long time horizons. Our results provide insights on the minimal low-rank structural assumptions required on the MDP with respect to the transition kernel versus the optimal action-value function.
University of Illinois at Urbana-Champaign
Date: Mar 07, 2022
A parallel server system with so-called cancel-on-completion redundancy is considered. A key feature of this model, making its analysis challenging, is that it is in general non-work-conserving: the average amount of new workload added to the system by an arriving job depends on the system state at the time of arrival. Our main focus is on the mean-field asymptotic regime where the number of servers goes to infinity with the job arrival rate per server remaining constant. The main results address the following question: under which conditions the steady-state asymptotic independence of server workloads holds. Our analysis relies almost exclusively on two fundamental properties of the model: monotonicity and the property that, on average, "new arriving workload prefers to go to servers with lower workloads.”
Kellogg School of Business, Northwestern University
Date: Mar 21, 2022
Parallel-server systems have received a lot of attention over the last several years and, in particular, a lot of effort has gone towards understanding the join-the-shortest-queue (JSQ) system. In this talk I will show that in the Halfin-Whitt regime, the steady-state distribution of the JSQ system converges to its diffusion limit at a rate of 1/sqrt(n), where n is the number of servers. The proof uses Stein's method and, specifically, the recently proposed prelimit generator approach. This particular application of Stein's method is nontrivial because both the JSQ model and its diffusion limit are multidimensional, and state-space collapse is involved.
Both this result and the underlying methodology are part of a broader agenda to make Stein's method more user friendly and expand the class of models for which it can be applied. Time permitting, I will give a brief overview of the current frontiers and limitations of the approach.
Columbia Business School
Date: Mar 28, 2022
This talk explores a new model of bandit experiments where a potentially nonstationary sequence of contexts influences arms' performance. Context-unaware algorithms risk confounding while those that perform correct inference face information delays. Our main insight is that an algorithm we call deconfounted Thompson sampling strikes a delicate balance between adaptivity and robustness. Its adaptivity leads to optimal efficiency properties in easy stationary instances, but it displays surprising resilience in hard nonstationary ones which cause other adaptive algorithms to fail.
California Institute of Technology
Date: April 04, 2022
Making use of modern black-box AI tools is potentially transformational for online optimization and control. However, such machine-learned algorithms typically do not have formal guarantees on their worst-case performance, stability, or safety. So, while their performance may improve upon traditional approaches in “typical” cases, they may perform arbitrarily worse in scenarios where the training examples are not representative due to, e.g., distribution shift or unrepresentative training data. This represents a significant drawback when considering the use of AI tools for energy systems and autonomous cities, which are safety-critical. A challenging open question is thus: Is it possible to provide guarantees that allow black-box AI tools to be used in safety-critical applications? In this talk, I will introduce recent work that aims to develop algorithms that make use of black-box AI tools to provide good performance in the typical case while integrating the “untrusted advice” from these algorithms into traditional algorithms to ensure formal worst-case guarantees. Specifically, we will discuss the use of black-box untrusted advice in the context of online convex body chasing, online non-convex optimization, and linear quadratic control, identifying both novel algorithms and fundamental limits in each case.
Stanford University
Date: April 25, 2022
The genealogy process is typically the most time-consuming part of -- and a limiting factor in the success of -- forensic genetic genealogy, which is a new approach to solving violent crimes and identifying human remains. We formulate a stochastic dynamic program that -- given the list of matches and their genetic distances to the unknown target -- chooses the best decision at each point in time: which match to investigate (i.e., find its ancestors), which set of potential common ancestors to descend from (i.e., find its descendants), or whether to terminate the investigation. The objective is to maximize the probability of finding the target minus a cost on the expected size of the final family tree. We estimate the parameters of our model using data from 17 cases (eight solved, nine unsolved) from the DNA Doe Project. We assess the Proposed Strategy using simulated versions of the 17 DNA Doe Project cases, and compare it to a Benchmark Strategy that ranks matches by their genetic distance to the target and only descends from known common ancestors between a pair of matches. The Proposed Strategy solves cases 10-fold faster than the Benchmark Strategy, and does so by aggressively descending from a set of potential most recent common ancestors between the target and a match even when this set has a low probability of containing the correct most recent common ancestor.
Duke University
Date: May 02, 2022
In the multi-secretary problem, T nonnegative, independent, random variables with common distribution are sequentially presented to a decision maker who is allowed to make k ≤ T selections to maximize the expected total value of the selected elements. Assuming that the values are drawn from a known distribution with finite support, we develop an adaptive budget-ratio policy and prove that its regret—the expected gap between the budget-ratio policy and the optimal offline solution in which all T values are made visible at time 0—is bounded by a constant that does not depend on the time horizon T and on the budget k. When we generalize the arrival process of this multi-secretary problem to one in which the arrivals in each period can be of different types and characterized by their rewards and by the resources they consume, we recover a general resource allocation problem. This (multidimensional) generalization, however, does not hinder our ability to construct an adaptive budget-ratio policy that continues to exhibit a regret that does not depend on the time horizon T or on the initial budget of each resource. Our analysis focuses on the geometric evolution of the remaining resource budgets as a stochastic process and proving that it is attracted to ``sticky'' regions of the state space where the budget-ratio policy takes actions consistent with the optimal offline solution. (Based on joint work with A. Vera, I. Gurvich and E. Levin.)
Cornell University
Date: May 09, 2022
The fundamental problem of high-dimensional path-dependent optimal stopping is central to applied probability, financial engineering, operations research, and stochastic control. Modern approaches, often relying on ADP, simulation, and/or duality, typically have limited rigorous guarantees, which may scale poorly and/or require previous knowledge of good basis functions. A key difficulty with many approaches is that to yield stronger guarantees, they would necessitate the computation of deeply nested conditional expectations, with the depth scaling with the time horizon T.
We overcome this fundamental obstacle by providing an algorithm which can trade-off between the guaranteed quality of approximation and the level of nesting required in a principled manner. This leads to a representation for the optimal value as an infinite sum for which: 1. each term is the expectation of an elegant recursively defined infimum; 2. the first k terms only require k levels of nesting; and 3. truncating at the first k terms yields a (normalized) error of 1/k. This enables us to devise simple randomized and data-driven algorithms that can trade-off between accuracy and runtime through a parameter epsilon controlling the performance guarantee (analogous to the notion of PTAS). The computational and sample complexity achieved are both polynomial in T (and effectively independent of the dimension) for any fixed epsilon, in contrast to past methods typically requiring a complexity scaling exponentially. Time permitting, we will also discuss connections to the theory of network flows and stochastic optimization, generalizations to dynamic pricing and online combinatorial optimization, and lower bounds for the trade-off between accuracy and the level of nesting required. Joint work with Yilun Chen.
We consider centralized dynamic matching markets with finitely many agent types and heterogeneous match values. A network topology determines the feasible matches in the market and the value generated from each match. An inherent trade-off arises between short- and long-term objectives. A social planner may delay match decisions to thicken the market and increase match opportunities to generate high value. This inevitably compromises short-term value, and the planner may match greedily to maximize short-term objectives.
A matching policy is hindsight optimal if the policy can (nearly) maximize the total value simultaneously at all times. We first establish that in multi-way networks, where a match can include more than two agent types, acting greedily is suboptimal, and a periodic clearing policy with a carefully chosen period length is hindsight optimal. Interestingly, in two-way networks, where any match includes two agent types, suitably designed greedy policies also achieve hindsight optimality. This implies that there is essentially no positive externality from having agents waiting to form future matches.
Central to our results is the general position gap, ε, which quantifies the stability or the imbalance in the network. No policy can achieve a regret that is lower than the order of 1/ε at all times. This lower bound is achieved by the proposed policies.
The talk is based on joint work with Suleyman Kerimov and Itai Ashlagi.
Columbia University
Date: May 23, 2022
The square-root voting rule was first proposed by Lionel Penrose (Roger's father), that each member country of a world assembly (such as UN) be given a number of votes equal to the square root of its population. The idea has become newly relevant in the POS (proof of stake) scheme of cryptocurrency (e.g., Ethereum) and other decentralized voting mechanisms. Focusing on the processes representing the total wealth over time and each participating entity’s share, we study their limiting distributions and related concentration inequalities. (Joint work with Wenpin Tang, Columbia/IEOR.)
Columbia University
Date: June 13, 2022
In this talk I will describe two vignettes of sequential decision making problems arising in the OR literature. The first is the classical “house selling” problem (or optimal stopping): given a random sequence of independent observations revealed one at a time over some finite horizon of play, the objective is to design an algorithm that “stops’’ this sequence to maximize the expected value of the ``stopped” observation. The second vignette is the worker assignment problem (or sequential stochastic assignment): arriving items (“jobs”) with independent stochastic attributes need to be sequentially matched to a pool of awaiting recipients (“workers”). Once each job/worker are matched they are no longer admissible for further assignment, and the objective is to maximize the expected cumulative value of the resulting matchings. Both problems date back more than 50 years and can be solved in a relatively straightforward manner using dynamic programming; both have been used as modeling constructs in numerous applications including recent online marketplaces. Somewhat surprisingly, neither has been studied extensively when the underlying information on the stochastic primitives is unknown a priori. While it is possible to analyze this incomplete information setting with generic multi-purpose learning theoretic methods, these tend to be quite inefficient when further “structure” is present in the problem and can be exploited to construct more customized algorithms. The main focus of this talk is to broadly describe possible learning theoretic formulations of the two vignettes, and illustrate some “design principles” for constructing near-optimal policies.