Learning Adaptive Control in Dynamic Environments Using Reproducing Kernel Priors with Bayesian Policy Gradients
Apan Dastider, Sayyed Jaffar Ali Raza and Mingjie Lin
University of Central Florida
Apan Dastider, Sayyed Jaffar Ali Raza and Mingjie Lin
University of Central Florida
This paper has been accepted as a conference paper at The 37th ACM/SIGAPP Symposium On Applied Computing (ACM-SAC)- 2022 (Session : Intelligent Robotics and Multi-Agent Systems-IRMAS) proceedings and presentation for oral presentation.
Abstract --- One of the most distinctive characteristics in biological evolution is to not only learn and reinforce knowledge from prior experience, but also develop a solution in (pseudo) real-time for future events by studying the past choices. Inspired by this observation, we aim at developing a systematic methodology of dynamically learning effective control policy for robotic manipulators that deftly avoid dynamic obstacles in fast-changing environments. Unfortunately, dynamical obstacles present time-varying statistical sensory irregularities, making learning based on prior experience much less productive. Furthermore, off-the-shelf policy gradient methods often become computationally expensive, and sometimes intractable, to adapt existing policy for dynamically changing environments. In this paper, to mitigate both of these challenges, we propose to use vector-valued kernel embedding (instead of parameter vectors) to represent policy distribution as features in non-decreasing Euclidean space. Furthermore, we develop policy search algorithm over Bayesian posterior estimation derived from inner-product of a priori Gaussian kernels, allowing the search space to be defined as high (possibly infinite) dimensional Reproducing Kernel Hilbert Space (RKHS). Our empirical results have shown that our proposed method performs optimally in a collaborative multi-robot setting, where two robot arms can manipulate in dynamic real-world environment, incrementally modifying their motion plan, to maintain a smooth, collision-free manipulation. In particular, comparing against a stateof-the-art DDPG (Deep Deterministic Policy Gradient)-based obstacle avoidance scheme as the baseline, our DRL (Developmental Reinforcement Learning) agent can not only effectively avoid dynamically generated obstacles while achieving its control objective, but do so with ∼ 25 times faster in learning performance. A video demo of our simulated and real-setting experimentation has been added in YouTube link https://youtu.be/GMM5V0eBQCs.