Learning Event-triggered Control from Data through Joint Optimization

This website contains the videos and code concerning the paper "Learning Event-triggered Control from Data through Joint Optimization" by N. Funk, D. Baumann, V. Berenz and S. Trimpe which has been published in the IFAC Journal of Systems and Control.

In this paper, we present a framework for model-free learning of event-triggered control strategies. Event-triggered methods aim to achieve high control performance while only closing the feedback loop when needed. This enables resource savings, e.g., network bandwidth if control commands are sent via communication networks, as in networked control systems. Event-triggered controllers consist of a communication policy, determining when to communicate, and a control policy, deciding what to communicate. It is essential to jointly optimize the two policies since individual optimization does not necessarily yield the overall optimal solution. To address this need for joint optimization, we propose a novel algorithm based on hierarchical reinforcement learning. The resulting algorithm is shown to accomplish high-performance control in line with resource savings and scales seamlessly to nonlinear and high-dimensional systems. The method’s applicability to real-world scenarios is demonstrated through experiments on a six degrees of freedom real-time controlled manipulator. Further, we propose an approach towards evaluating the stability of the learned neural network policies.

If you use code or ideas from this work for your projects or research, please cite it.

@article{funk_learn_etc,title = {Learning event-triggered control from data through joint optimization},journal = {IFAC Journal of Systems and Control},volume = {16},pages = {100144},year = {2021},issn = {2468-6018},doi = {https://doi.org/10.1016/j.ifacsc.2021.100144},url = {https://www.sciencedirect.com/science/article/pii/S2468601821000055},author = {Niklas Funk and Dominik Baumann and Vincent Berenz and Sebastian Trimpe}}


Half-Cheetah Environment (Section 6.2.1)

Performance of the models, trained using our algorithm. Impact of saving communication on the gait.

This video depicts the gait behavior of the Cheetah for different amounts of communication savings for policies using our proposed algorithm.

Performance of the baseline PPO policy and the baseline under randomly skipping communication.

This video depicts the gait behavior of the Cheetah for different amounts of communication savings, using a baseline policy trained using the PPO algorithm. The rollout of 0% communication savings corresponds to the unmodified baseline policy. The rollouts with 10%, 20% and 40% yield from using the baseline and randomly skipping communication with probability 10%, 20% and 40%, respectively.

Comparing the performance of the models trained using our algorithm with the original PPO baseline and the baseline under randomly skipping communication.

This video depicts the performance and gait differences when comparing policies trained by our algorithm with policies from the PPO baseline. Again, the communication savings for the PPO baseline are achieved by randomly skipping communication with a probability of 20%.


Ant Environment (Section 6.2.2)

Rollouts of models trained using the proposed algorithm.

This video depicts the gait behavior of the Ant for different amounts of communication savings. The policies have been trained using our algorithm.


Results on Hardware - the Apollo robot (Section 7)

Video showing the resulting performance in the dynamic reference tracking scenario, where the goal is to reach close to the cup. (Section 7.2)

The video shows the result of performing event-triggered setpoint tracking on the Apollo robot. The reference position is obtained using the Vicon system and is therefore dynamically changing. During the rollout, 85% of communication is being saved while the reference is still tracked reliably.

Performing obstacle avoidance on the Apollo robot. (Section 7.3)

Performing event-triggered obstacle avoidance on Apollo. The reference position is now fixed and placed on the other side of the obstacle. The robot reliably reaches the reference, without the end-effector hitting the obstacle while still saving around 92% of communication.


Results on Hardware - the Apollo robot - Using policies with different Numbers of Options (Appendix)

Video illustrating the differences between policies that have been trained using a different number of options. (Section Appendix A)

This video illustrates the process of reaching the same reference position from the same starting position on the Apollo robot, with policies that have a different number of options available. Again, the reference position is fixed.