Lukas Kesper - Sebastian Trimpe - Dominik Baumann
Event-triggered communication and control provide high control performance in networked control systems without overloading the communication network. However, most approaches require precise mathematical models of the system dynamics, which may not always be available. Model-free learning of communication and control policies provides an alternative. Nevertheless, existing methods typically consider single-agent settings. In this paper, we propose a model-free reinforcement learning algorithm that jointly learns resource-aware communication and control policies for distributed multi-agent systems from data. We evaluate the algorithm in a high-dimensional and nonlinear simulation example and discuss promising avenues for further research.
The agents learn a stable policy where they move the package forward without dropping it. We extract the weights of the DNN from the epoch in which the highest savings in communication are achieved, resulting in savings of around 55%. The resulting policy is intuitive. Most of the time, only two agents actively work on transporting the package, reducing the need to coordinate. This way, they are able to move the package forward and remain stable.
For PPO, we randomize communication updates such that agents drop around 55% of the packages. While PPO does not make its own decisions about communication, we leave the remaining design unchanged. The return PPO obtains is significantly lower than for our algorithm. We extracted weights for the DNN from the epoch with the highest reward and sampled trajectories. The video shows that the agents cannot successfully transport the package forward, demonstrating that the task is non-trivial.