Robotic legged locomotion has been one of the focuses of the lab's research. The topics of interest include: 1) model-based and model-fee locomotion, 2) learning locomotion policies using deep reinforcement learning, 3) energetically efficient locomotion as measured by cost of transport, 4) robust locomotion in the face of strong perturbances, 5) control theory for stabilization of RL-learned policies, 6) uneven terrain locomotion, 6) low bandwidth algorithm for robust locomotion on uneven terrain.
Reinforcement learning methods often produce brittle policies -- policies that perform well during training, but generalize poorly beyond their direct training experience, thus becoming unstable under small disturbances. To address this issue, we propose a method for stabilizing a control policy in the space of configuration paths. It is applied post-training and relies purely on the data produced during training, as well as on an instantaneous control-matrix estimation. The approach has been evaluated empirically on a planar bipedal walker subjected to a variety of perturbations. In a fair comparison, the control policies obtained via reinforcement learning were compared against their stabilized counterparts. Across different experiments, we found two- to four-fold increase in stability, when measured in terms of the perturbation amplitudes. We have also established a zero-dynamics interpretation of our approach. In addition, this method can be used to construct a controller for an underactuated mechanism purely out of uncontrolled failure examples, without any additional learning. This application has been demonstrated on the acrobot balancing task.
To appear in: International Journal of Control, Automation and Systems.
Preprint: arXiv:2204.02471.
We have studied a three-dimensional articulated rigid-body biped model that possesses zero cost of transport walking gaits. Energy losses were avoided due to the complete elimination of the foot-ground collisions by the concerted oscillatory motion of the model's parts. The model consists of two parts connected via a universal joint. It does not rely on any geometry altering mechanisms, massless parts or springs. Despite the model's simplicity, its collisionless gaits feature walking with finite speed, foot clearance and ground friction. The collisionless spectrum can be studied analytically in the small movement limit, revealing infinitely many periodic modes. The modes differ in the number of sagittal and coronal plane oscillations at different stages of the walking cycle. We have focused on the mode with the minimal number of such oscillations, presenting its complete analytical solution. We then numerically evolved it toward a general non-small movement solution. A general collisionless mode can be tuned by adjusting a single model parameter. Some of our results display a surprising degree of generality and universality.
Phys. Rev. E 103, 043003 (2021).
Preprint: arXiv:2106.11765.
The mapped configuration control (MCC) is a novel and principled control technique in the context of the problem of legged locomotion on uneven terrain. The approach adapts a level ground control to uneven ground control without a surrounding terrain mapping and planning. Control modes stemming from this technique are similar to some of those proposed in the literature. However, those proposals are typically more heuristic and stay within the static stability paradigm. MCC is more general, as it is not limited to the static stability considerations. The viability of this method has been confirmed in simulation experiments on a hexapedal and quadrupedal robots, by comparing it with the position control method across a number of uneven terrain locomotion tasks. Five- to six-fold improvement in stability was observed, as measured by the scale of uneven terrain features. MCC is therefore a promising candidate among the computationally cheap control methods, not requiring surrounding terrain mapping and extensive planning or optimization.
Code.
The code has also been adapted to cover hexapedal robots with redundant degrees of freedom, like Weaver robot with 5-DoF legs. See examples below.
Often RL benchmark tasks, such as OpenAI Gym, are set up with very lenient motor work penalties. This enables the learned policies to exploit unreasonably large actuation values, leading to unnaturally looking and energetically inefficient locomotion gaits. Presumably, the higher penalties are avoided as they restrict the policy search space, making policy search a much harder problem. But the low motor penalty search comes at a high energetic cost of learned policies, relying on large motor torques, to the point of becoming unsuitable for real world robots. Conversely, the higher motor work penalties, while requiring more training effort and often demanding larger neural networks for good performance, lead to more energetically economical policies with more fluid and naturally looking motion. These points are illustrated in the following video, showing planar model policies trained under high motor penalties, leading to low COT gaits.