Learning legged mobile manipulation using reinforcement learning

The Study objective is to check whether the whole-body learning-based control is feasible for legged mobile manipulation.


This study is conducted in the KAIST Robotics and Artificial Intelligence Lab (Hubo Lab). / This paper is accepted to the 10th International Conference on Robot intelligence Technology and Application (RiTA 2022).


Overall of the study


Locomotion with Controlling Manipulator

Reward Function

By comparing the current speed using the min and max functions so that the x, y, and yaw speeds can be saturated to the target, the direction that maximizes the reward can be set as a policy that moves as much as the target speed. In the case of the arm joint angle, the exponential function was used to minimize the error in the direction of maximizing the reward. On the other hand, the movement was made stable by minimizing the torque. At this time, minimizing the torque of all joints could conflict with controlling the arms, so the reward function was written to minimize only the torque of the legs.

Error for walking test


Tracking of the End-Effector

Reward Function

The reward function was used like this: In this case, the torque of the entire joint was provided as a penalty to minimize the movement of the joint of the leg and the joint of the arm in order to make the most optimal movement. The position of the end effector was set so that minimizing the error using the exponential function could be equated with maximizing the reward in the same way as the arm control in the first experiment. Finally, since the goal is to control the end-effector without the robot walking, the exponential function is used so that the velocity of the base becomes 0.

Error for tracking test

KAIST

291, Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea