Recent research projects

Control theory and Reinforcement Learning

The regret is a common performance criterion in machine learning.  The regret has a comparative nature allowing one to design a feasible system by minimizing some distance from a non-practical system that serves as a reference. For instance, in Reinforcement Learning (RL), the regret is used to compare the reward of a (feasible) agent that has no access to the environment parameters with the reward obtained by an agent that knows the environment parameters.

In this work, we introduced the regret criterion to a classical control problem, the full-information control problem (i.e., the LQR setting with a disturbance). The regret in this scenario is used to compare a causal controller with a clairvoyant controller that has non-causal access to the entire disturbance sequence. 

The main result is an explicit and precise solution to the regret problem. We present construct a controller that achieves the optimal regret. The result is mathematically beautiful and has a potential for practical implementation in real systems due to its simplicity. The merits of the new controller are illustrated in the figure.

YouTube seminar on this by Babak: Regret-Optimal Control

The figure describes the cost (vs. frequency domain) incurred by classical controllers and our new controller (red curve). The non-causal controller (black curve) has the best performance (smallest cost) across all noise sequences, however, it cannot be implemented in practice. Our new controller (red curve) has the smallest distance to the noncausal controller since it is designed to minimize the regret, i.e., the maximal distance from the noncausal controller. While H2 controllers (blue line) are optimal in a white noise regime (i.e., minimizing the area under its curve) and H-infty controllers (green curve) are robust with respect to the worst noise (i.e., minimizing the peak), our controller interpolates these two extremes and still performs well in both regimes. 

2. Regret-Optimal Filtering (estimation):

The Kalman filter is one of the most beautiful and practical engineering inventions. At the heart of this filter optimality is the assumption that the driving disturbance and the measurement noise are stochastic with a Gaussian distribution. In this work, we revisit this assumption using a novel regret criterion for linear dynamical systems. The main idea is to design a filter which minimizes the estimation error of the defined filter  from a non-causal controller. The main result is an explicit, simple filter that minimizes the regret. The performance of the resulted filter interpolated between the two stochastic (Kalman filter) and the robust approach (H-infinity).

Accepted to AISTATS 2021. Available at Arxiv.

3. Reinforcement Learning Evaluation for Finite-State channels

The computation of the fundamental limits of communication, i.e., the channel capacity, for channels with memory is one of the long standing open problems in Information Theory. For channels with feedback, it is possible to convert the capacity computation problem into the optimization of a Markov decision process (MDP). However, the MDP has a large state space and, therefore, most of the channels that have been solved in the literature have a small cardinality of the state. The recent surge in machine learning algorithms allows one to evaluate (numerically) MDPs with large state space using algorithms like the DDPG and the POU. This work provides an elegant bridge between the analytical study of communication settings and the efficient, numerical machinery that is being exploited in machine learning. Perhaps the most exciting novelty in this work is that we provide a mechanism to convert numerical simulations in RL into closed-form and simple analytical formulas that we demonstrate for the Ising channel with large alphabet.

More details can be found at:

Ziv Aharoni, Oron Sabag and Haim Permuter, "Reinforcement Learning Evaluation and Solution for the Feedback Capacity of the Ising Channel with Large Alphabet. Submitted to IEEE Transactions on Information Theory and available online at Arxiv

A short version of this paper was in the shortlist for the student best paper award at the ISIT 2019.