Luigi Berducci*, Edgar A. Aguilar*, Dejan Ničković, Radu Grosu
TU Wien, Cyber-Physical Systems Group
AIT Austrian Institute of Technology GmbH
* Indicates authors with equal contributions
[Preprint] | [Code] | [Video]
The automatic synthesis of policies for robotic-control tasks through reinforcement learning relies on a reward signal that simultaneously captures many possibly conflicting requirements. In this paper, we introduce a novel, hierarchical, potential-based reward-shaping approach (HPRS) for defining effective, multivariate rewards for a large family of such control tasks.
We formalize a task as a partially-ordered set of safety, target, and comfort requirements, and define an automated methodology to enforce a natural order among requirements and shape the associated reward. Building upon potential-based reward shaping, we show that HPRS preserves policy optimality.
Our experimental evaluation demonstrates HPRS's superior ability in capturing the intended behavior, resulting in task-satisfying policies with improved comfort, and converging to optimal behavior faster than other state-of-the-art approaches. We demonstrate the practical usability of HPRS on several robotics applications and the smooth sim2real transition on two autonomous-driving scenarios for F1TENTH race cars.
Contributions
Formal specification language to express classes of requirements that often occur in control tasks (safety, target, comfort).
Automatic inference of requirements priority based on the belonging class
(safety > target > comfort).
Multivariate reward signal that embeds the requirements priorities and dynamically adapts over time, according to the satisfaction of higher-priority requirements.
We integrate robust policy training in simulation, with domain randomization of environment parameters, and transfer the HPRS policy on real-hardware using F1TENTH racing cars.
@article{berducci2021hierarchical,
title={Hierarchical potential-based reward shaping from task specifications},
author={Berducci, Luigi and Aguilar, Edgar A and Ni{\v{c}}kovi{\'c}, Dejan and Grosu, Radu},
journal={arXiv preprint arXiv:2110.02792},
year={2021}
}
Luigi Berducci is supported by the Doctoral College Resilient Embedded Systems. This work has received funding from the EU’s Horizon 2020 research and innovation programme under grant No 956123 and from the Austrian FFG ICT of the Future program under grant No 880811.
We thank Axel Brunnbauer for contributing in the early stage of this work.