Brief Bio: Atilla Eryilmaz received his M.S. and Ph.D. degrees in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign in 2001 and 2005, respectively. Between 2005 and 2007, he worked as a Postdoctoral Associate at the Laboratory for Information and Decision Systems at the Massachusetts Institute of Technology. Since 2007, he has been at The Ohio State University, where he is currently a Professor and the Graduate Studies Chair of the Electrical and Computer Engineering Department. Dr. Eryilmaz's research interests span optimal control of stochastic networks, machine learning, optimization, and information theory.
Webpage: https://ece.osu.edu/people/eryilmaz.2
Title: Recurrent Natural Policy Gradient for POMDPs
Abstract: We study a natural policy gradient method based on recurrent neural networks (RNNs) for partially-observable Markov decision processes, whereby RNNs are used for policy parameterization and policy evaluation to address curse of dimensionality in non-Markovian reinforcement learning. We present finite-time and finite-width analyses for both the critic (recurrent temporal difference learning), and correspondingly-operated recurrent natural policy gradient method in the near-initialization regime. Our analysis demonstrates the efficiency of RNNs for problems with short-term memory with explicit bounds on the required network widths and sample complexity, and points out the challenges in the case of long-term dependencies.
Joint work with Prof. Semih Cayci (https://www.mathc.rwth-aachen.de/en/~cayci)
Brief Bio: Rahul Vaze obtained his Ph.D. from The University of Texas at Austin in 2009. Since Oct. 2009, he has been with the Tata Institute of Fundamental Research, Mumbai, India, where he is currently an Associate Professor. His research interests are in communication networks, combinatorial resource allocation, online algorithms. He is the author of "Random Wireless Networks", Cambridge University Press, 2015, and "Online Algorithms", Cambridge University Press, 2023. He has won multiple best papers awards, including the best paper award WiOpt 2020, the best paper award Performance 2020, and best runner-up paper award WiOpt 2022.
Webpage: https://www.tcs.tifr.res.in/~vaze/
Title: Optimal Bounds for Constrained Online Convex Optimization
Abstract: A well-studied generalization of the standard online convex optimization (OCO) is the constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. The objective is to design an online policy that simultaneously achieves a small regret while ensuring a small cumulative constraint violation (CCV). A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $O(\sqrt{T})$ CCV unconditionally. For the first time, we answer this in the affirmative, where surprisingly, the analysis is short and elegant.
Brief Bio: XYZ received Ph.D. degree in computer engineering from the ABC University in 2015. He is currently at the DEF. His research interests include machine learning, deep learning, and networks. His article was a runner-up for the best paper award in GHI.
Webpage: JKL
Title: Online Learning for Control
Abstract: OLC
Brief Bio: Krishna Jagannathan obtained his B. Tech. in Electrical Engineering from IIT Madras in 2004, and the S.M. and Ph.D. degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology (MIT) in 2006 and 2010 respectively. During 2010-2011, he was a visiting post-doctoral scholar in Computing and Mathematical Sciences at Caltech, and an off-campus post-doctoral fellow at MIT. Since November 2011, he has been with the Department of Electrical Engineering, IIT Madras, where he is currently a professor. His research interests lie in the stochastic modelling and analysis of communication networks, classical & quantum information theory, and queuing theory.
Dr. Jagannathan served on the editorial boards of the journals IEEE/ACM Transactions on Networking and Performance Evaluation, and is presently an Associate Editor of the IEEE Transactions on Information Theory.
Webpage: https://www.ee.iitm.ac.in/krishnaj/
Title: Classical Bandit Algorithms for Entanglement Detection in Parameterized Qubit States
Abstract: Entanglement is a key property of quantum states that acts as a resource for a wide range of tasks in quantum information and computing. Detecting whether an unknown quantum state is entangled or separable is an important problem, both conceptually and practically. We establish a novel correspondence between the problem of entanglement detection, and a thresholding problem in Linear Multi-Armed Bandits. The correspondence to linear bandits hinges on the fact that quantum measurements are essentially noisy inner products of the unknown state with the measurement operator. We use specialised set of measurement operators called Witness Basis Measurements that act as separating/supporting hyperplanes for the set of separable states. We propose a method to conclusively detect entanglement for certain parametric family of lab-realistic two qubit states, and derive sample complexity for δ-correct entanglement detection.
More broadly, this work highlights the potential for using classical learning techniques for relevant problems in quantum information.
Collaborative work with my Ph.D. student Bharati K and Dr. Vikesh Siddhu (IBM).
Brief Bio: Srinivas Shakkottai received his PhD in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign in 2007, after which he was a postdoctoral scholar in Management Science and Engineering at Stanford University. He joined Texas A&M University in 2008, where he is currently a professor at the Dept. of Electrical and Computer Engineering and at the Dept. of Computer Science and Engineering (by courtesy). His research interests include multi-agent learning and game theory, reinforcement learning, communication and information networks, networked markets, as well as data collection and analytics. He co-directs the Learning and Emerging Network Systems (LENS) Laboratory and the RELLIS Spectrum Innovation Laboratory (RSIL) at Texas A&M University. He has served as an Associate editor of IEEE/ACM Transactions on Networking and the IEEE Transactions on Wireless Communications. Srinivas is the recipient of the Defense Threat Reduction Agency (DTRA) Young Investigator Award and the NSF CAREER Award, as well as research awards from Cisco and Google. His work has received honors at fora such as ACM MobiHoc, ACM eEnergy and the International Conference on Learning Representations. He has also received an Outstanding Professor Award, the Select Young Faculty Fellowship, and the Engineering Genesis Award (twice) at Texas A&M University.
Webpage: https://cesg.tamu.edu/faculty/srinivas-shakkottai/
Title: Compressive Sensing for Zeroth-Order Gradient Estimation in Online Convex Optimization
Abstract: We tackle the problem of online convex optimization in scenarios where the gradient of the objective function is sparse, meaning that only a few dimensions have non-zero gradients. Our goal is to utilize this sparsity to derive accurate gradient estimates, even with access to only a limited set of function evaluations. This work is motivated by distributed queueing systems, such as those found in microservices-based applications that handle request-response workloads. In such systems, each request type passes through a series of microservices, and resource allocation across these microservices is managed to strike a balance between end-to-end latency and resource usage. Although the number of microservices can be large, the latency function is mainly sensitive to resource changes in just a few of them, leading to sparse gradients. We introduce a method that integrates simultaneous perturbation and compressive sensing to estimate gradients efficiently. We provide theoretical bounds on the number of compressive sensing samples needed per iteration to achieve bounded bias in gradient estimates, ensuring sub-linear regret. By leveraging sparsity, our approach significantly reduces the number of samples required per iteration to match the gradient’s sparsity rather than the full dimensionality of the problem. Through numerical simulations and tests on real-world microservices, we demonstrate its effectiveness, outperforming several stochastic gradient descent methods and converging rapidly to performance levels similar to policies pre-trained with workload information.
Brief Bio: Dr. Hou is a Professor in the ECE Department of Texas A&M University. His research interests include cloud/edge computing, wireless networks, and machine learning.
Dr. Hou received the B.S. in Electrical Engineering from National Taiwan University in 2004 and his M.S. and Ph.D. in Computer Science from the University of Illinois, Urbana-Champaign in 2008 and 2011, respectively. He received the Best Paper Awards in ACM MobiHoc 2020 and ACM MobiHoc 2017, the Best Student Paper Award in WiOpt 2017, and the C.W. Gear Outstanding Graduate Student Award from the University of Illinois at Urbana-Champaign.
Webpage: https://cesg.tamu.edu/faculty/i-hong-hou/
Title: Deep RL for Index Policy in Restless Bandit Problems
Abstract: Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted or unknown transition kernels. In this talk, we discuss employing deep reinforcement learning to train a neural network that predicts the Whittle indices. We leverage a fundamental property that the Whittle index can be viewed as an optimal control policy for a set of infinite MDPs. We further show that, despite the need to consider infinite MDPs, there is a simple expression of the policy gradient. Simulation results show that our algorithm learns the Whittle index much faster than several recent studies that learn the Whittle index through indirect means.
Brief Bio: Gauri Joshi is a faculty member in the ECE department at Carnegie Mellon University. Gauri completed her Ph.D. from MIT EECS, and received her B.Tech and M.Tech from the Indian Institute of Technology (IIT) Bombay. Her awards include the MIT Technology Review 35 under 35 Award, ONR Young Investigator and NSF CAREER Award, Best Paper awards at MobiHoc 2022 and SIGMETRICS 2020, and the Institute Gold Medal of IIT Bombay (2010).
Webpage: https://www.andrew.cmu.edu/user/gaurij/
Title: Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems
Abstract: We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. But an optimal policy for the multi-server system is unknown and non-trivial to find. While Reinforcement Learning (RL) has been recognized to have great potential for learning policies in such cases, our problem has an exponentially large state space size, rendering standard RL inefficient. In this work, we propose ACHQ, an efficient policy gradient based algorithm with a low dimensional soft threshold policy parameterization that leverages the underlying queueing structure. We provide stationary-point convergence guarantees for the general case and despite the low-dimensional parameterization prove that ACHQ converges to an approximate global optimum for the special case of two servers. Simulations demonstrate an improvement in expected response time of up to ~30% over the greedy policy that routes to the fastest available server. This is joint work with Neharika Jali, Guannan Qu and Weina Wang.
Brief Bio: Jia (Kevin) Liu is an Associate Professor in the Dept. of Electrical and Computer Engineering at The Ohio State University (OSU) and an Amazon Visiting Academic (AVA) with Amazon.com. From Aug. 2017 to Aug. 2020, he was an Assistant Professor in the Dept. of Computer Science at Iowa State University (ISU). He currently serves as the Managing Director of the NSF AI Institute for Future Edge Networks and Distributed Intelligence (AI-EDGE) at OSU. He received his Ph.D. degree from the Dept. of Electrical and Computer Engineering at Virginia Tech in 2010. His research areas include theoretical machine learning, stochastic network optimization and control, and performance analysis for data analytics infrastructure and cyber-physical systems. Dr. Liu is a senior member of IEEE and a member of ACM. He has received numerous awards at top venues, including IEEE INFOCOM'19 Best Paper Award, IEEE INFOCOM'16 Best Paper Award, IEEE INFOCOM'13 Best Paper Runner-up Award, IEEE INFOCOM'11 Best Paper Runner-up Award, and IEEE ICC'08 Best Paper Award. He has also received multiple honors of long/spotlight presentations at top machine learning conferences, including ICML, NeurIPS, and ICLR. He is an NSF CAREER Award recipient in 2020, a winner of the DARPA Young Faculty Award (YFA) in 2024, and a winner of the Google Faculty Research Award in 2020. He received the LAS Award for Early Achievement in Research at Iowa State University in 2020, and the Bell Labs President Gold Award. Dr. Liu is an Associate Editor for IEEE Transactions on Cognitive Communications and Networking. He has served the TPC for numerous top conferences, including ICML, NeurIPS, ICLR, ACM SIGMETRICS, IEEE INFOCOM, and ACM MobiHoc. His research is supported by NSF, DARPA, AFOSR, AFRL, ONR, Google, and Cisco.
Webpage: https://kevinliu-osu.github.io/
Title: Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning
Abstract: Reinforcement learning with multiple, potentially conflicting objectives is pervasive in real-world applications, while this problem remains theoretically under-explored. In this research, we tackle the multi-objective reinforcement learning (MORL) problem and introduce an innovative actor-critic algorithm named MOAC which finds a policy by iteratively making trade-offs among conflicting reward signals. Notably, we provide the first analysis of finite-time Pareto-stationary convergence and corresponding sample complexity in both discounted and average reward settings. Our approach has two salient features: (a) MOAC mitigates the cumulative estimation bias resulting from finding an optimal common gradient descent direction out of stochastic samples. This enables provable convergence rate and sample complexity guarantees independent of the number of objectives; (b) With proper momentum coefficient, MOAC initializes the weights of individual policy gradients using samples from the environment, instead of manual initialization. This enhances the practicality and robustness of our algorithm. Finally, experiments conducted on a real-world dataset validate the effectiveness of our proposed method.
Brief Bio: Jian Li is an Assistant Professor of Data Science at Stony Brook University. He received his Ph.D. in Computer Engineering from Texas A&M University in 2016, and his B.E. from Shanghai Jiao Tong University in 2012. His current research interests lie at the intersection of algorithms for reinforcement learning and multi-armed bandits, decentralized/federated learning, and stochastic optimization and control, with applications to next-generation networked systems. He is a recipient of the NSF CAREER Award (2024) and NSF CISE CRII Award (2021).
Webpage: https://sites.google.com/stonybrook.edu/jianli
Title: Direct Online Preference Learning for Restless Multi-Armed Bandits with Preference Feedback
Abstract: Restless multi-armed bandits (RMAB) has been widely used to model constrained sequential decision-making problems, where the state of each restless arm evolves according to a Markov chain and each state transition generates a scalar reward. However, the success of RMAB crucially relies on the availability and quality of reward signals. Unfortunately, specifying an exact reward function in practice can be challenging and even infeasible. In this work, we introduce Pref-RMAB, a new RMAB model in the presence of preference signals, where the decision maker only observes pairwise preference feedback rather than scalar reward from the activated arms at each decision epoch. Preference feedback, however, arguably contains less information than the scalar reward, which makes Pref-RMAB seemingly more difficult. To address this challenge, we present a direct online preference learning (DOPL) algorithm for Pref-RMAB to efficiently explore the unknown environments, adaptively collect preference data in an online manner, and directly leverage the preference feedback for decision-making. We prove that DOPL yields the first-ever sublinear regret for RMAB with preference feedback. Experimental results further demonstrate the effectiveness of DOPL.
Brief Bio: Ana Bušić is a Research Scientist at Inria Paris Research Centre and the Computer Science Department at Ecole Normale Supérieure, PSL University, France. She received the M.S. degree in Applied Mathematics and the Ph.D. degree in Computer Science from the University of Versailles, France, in 2003 and 2007, respectively. She was a Post-Doctoral Fellow at Inria Grenoble–Rhône-Alpes and at University Paris-Diderot-Paris 7. Her research interests include stochastic modeling, simulation, optimization, and reinforcement learning, with applications to communication networks and energy systems. She is a member of the Laboratory of Information, Networking and Communication Sciences, a joint lab between Inria, Institut Mines-Télécom, UPMC Sorbonne Universities, Nokia Bell Labs, and SystemX. She received a Google Faculty Research Award in 2015.
Website: https://www.di.ens.fr/~busic/
Title: Kullback–Leibler-Quadratic Cooperative Mean-Field Control
Abstract: We will discuss approaches to mean-field control, motivated by distributed control of multiagent systems. Control solutions are based on a convex optimization problem, whose domain is a convex set of probability mass functions (pmfs). Kullback–Leibler-quadratic (KLQ) optimal control is a special case in which the objective function is composed of a control cost in the form of Kullback–Leibler divergence between a candidate pmf and the nominal, plus a quadratic cost on the sequence of marginals. Transform techniques are introduced to reduce complexity of the KLQ solution, motivated by the need to consider time horizons that are much longer than the intersampling times required for reliable control. The approach is illustrated on an application of distributed control of residential loads to provide grid services, similar to utility-scale battery storage. The results show that KLQ optimal control enables the aggregate power consumption of a collection of flexible loads to track a time-varying reference signal, while simultaneously ensuring each individual load satisfies its own quality of service constraints.
Joint work with N. Cammardella and S. Meyn.
Brief Bio: Sushil Varma is an incoming tenure-track assistant professor at the Industrial and Operations Engineering Department of the University of Michigan, Ann Arbor. He is currently pursuing a postdoc at INRIA Paris. His research interests include queueing theory and revenue management with applications in online marketplaces and electric vehicles. Sushil has won the Stephen. S. Lavenberg Best Student Paper Award in IFIP Performance 2021. He was also a finalist in the INFORMS Transportation Science and Logistics (TSL) Best Student Paper Award.
Webpage: https://sites.google.com/view/sushil-varma/home
Title: Electric Vehicle Fleet Sizing and Charging Infrastructure Planning
Abstract: We analyze an optimal electric vehicle (EV) fleet and charging infrastructure capacity planning problem in a spatial setting. As customer requests arrive at rate λ, the system operator must determine the minimum number of vehicles and chargers for a given service level along with a matching and charging policy that maximizes the service level. We provide a sharp characterization of the fleet size and the charging infrastructure requirements as the demand grows. We show that an EV system has a fundamentally different scaling due to non-negligible charging times. In addition, we propose the Power-of-d dispatching policy, which achieves near-optimal performance by selecting the d closest vehicles to a trip request and choosing the one with the highest battery level. We conduct simulations on Chicago transportation data verifying our operational insights.