Saghar Adler - Research

Research Overview

My main research focus is data-driven decision-making and control, applied to address various resource control problems arising in queueing systems-based models of communication networks and service systems. I have explored two main research directions:

Incorporating model knowledge with the aim of exploring efficient learning-based control.

Whereas reinforcement learning is commonly used in this paradigm, it highly depends on the availability of reward signals. In many optimal control problems in dynamic systems, reward signals are unavailable, and other solution paradigms are needed. A thesis we explore is the use of model knowledge to augment the lack of reward signals. In arXiv:2202.02419 , we study the problem of learning-based optimal admission control in an Erlang-B queueing model with unknown service rate, where at each job arrival, a dispatcher decides to admit or block the job. Each admitted job yields a fixed reward but incurs a cost per unit time of service. While the arrival times and the system state at arrivals are known, the dispatcher never observes the service times and, thus, cannot determine the reward of accepting a job. Based on this, a maximum likelihood (ML) estimate-based dispatching policy is proposed: at each arrival, using ML estimation, an estimate of the unknown parameter is formed. Then, the action is taken according to certainty equivalence control law with forced exploration. We prove that for all service rates, our policy asymptotically learns to take the optimal action and show finite regret for one parameter regime and a logarithmic regret for the other.

Extending the existing learning approaches with the aim of including queueing-based models.

Models of many real-life applications, such as queuing models of communication networks, have a countably infinite state-space. Learning procedures developed to produce optimal policies mainly focus on finite state settings and do not apply to these models. To overcome this lacuna, in arXiv:2306.02574 , we study the problem of optimal control of a family of discrete-time Markov Decision Processes (MDPs) governed by an unknown parameter, defined on a countably-infinite state space , with finite action space, and an unbounded cost function with the goal of designing a learning algorithm that minimizes the regret compared to the long-term average cost optimal policy. An algorithm based on Thompson sampling with dynamic episodes is proposed: at the beginning of each episode, using Bayes' rule, a posterior distribution is formed, and an estimate is realized from this distribution and used during the episode. In contrast to finite-state MDPs, where stability and the existence of a stationary distribution are assured in a simple manner, the countable state-space setting needs additional conditions to ensure ergodicity. By imposing stability conditions and using the solution of the average cost Bellman equation, we prove a sub-linear upper bound for the Bayesian regret. Finally, for two queuing models with unknown dynamics, we argue that our algorithm can be applied to develop approximately optimal control algorithms.