Multi-arm space robots can efficiently complete target capture and base reorientation tasks due to their flexibility and the collaborative capabilities between the arms. However, the complex coupling properties arising from its multiple arms and the free-floating base present challenges to the motion planning of its arms. We observe that the octopus elegantly achieves similar goals when grabbing prey and escaping from danger. Inspired by the distributed control strategy of octopuses’ limbs, we develop a multi-level decentralized motion planning framework to manage the movement of arms of space robots. This motion planning framework integrates naturally with the multi agent reinforcement learning (MARL) paradigm. Furthermore, we apply transformer blocks to the critic network, which enables agents to better learn collaborative strategies. The experimental results indicate that our method, the Octopus-inspired Policy Learning (Octo-PL), outperforms methods based on centralized training by a large margin. Leveraging the flexibility of the decentralized framework, we reassemble policies trained for different tasks and enable the space robot to complete trajectory planning tasks while adjusting its base attitude without further training. Furthermore, our experiments confirm the superior robustness of our method in face of external disturbances, changing base masses, and even the failure of one arm.
Our contribution can be summarized as follows:
Inspired by octopuses we devise a hierarchical and distributed framework for the motion planning of multi-arm space robots, which alleviates the difficulty of optimization through decomposing into multiple sub-problems.
We propose the Octopus-inspired Policy Learning (Octo-PL) algorithm, designed on a multi-agent paradigm that surpasses baseline methods in precision and robustness of trajectory planning and base reorientation. To enhance value prediction accuracy, we incorporate a self-attention mechanism in the critic network to integrate the current states of all agents effectively.
Leveraging the flexibility of decentralized control, we reassemble policies trained for different tasks onto the same space robot. Through coordination between agents, the space robot can complete the trajectory planning task while adjusting its base attitude without the need for further training.
The similarities between space robots and octopuses lie in their environments (zero-gravity outer space / underwater world), configurations (multiple robotic arms / multiple tentacles), and tasks (target capture / hunting). Inspired by the distributed brains of octopuses, we adopt a distributed control framework for space robots, in which each robotic arm learns its own strategy hierarchically for different tasks.
The agent division methodology and the network structure of our proposed Octo-PL algorithm. The joints are divided into eight agents in accordance with three levels: the single-arm level, the multi-arm level, and the task level. To enhance the capabilities of the space robot, at the task level we can assign various tasks, such as trajectory planning and base reorientation, to the multiple arms of the space robot.
Average performances for Octo-PL, Octo-PL w/o TC, MAPPO, PPO and MADDPG over three seeds; the x-axis is training iteration. The rewards for all agents are added together in MARL algorithms in the trajectory planning task to make comparison with PPO.
The steady state error of the end-effector position and orientation and the base attitude of different algotithms. The results are obtained under 30 random seeds for each task.
The position and orientation error in the presence of disturbance force in the trajectory planning task. The external force is exerted at 7.5s.
The error curves along the x,y and z axes in the presence of disturbance force in the trajectory planning task. The external force is exerted at 7.5s.
The error curves of the base in the base reorientation task under different scenarios. In the anti-disturbance experiment, the external force is exerted at 7.5s.
The adjustment process of the space robot with reassembled policies. The left arm aims to reach the desired position and orientation in green, while the other three arms work on adjusting the base's attitude.
The errors of the end-effector and base attitude during adjustment.
Pre-grasping task for a rotating satellite under the combined strategy. One arm grasps the handle of the rotating satellite, while the other arms adjust the base orientation to ensure the communication equipment faces the earth station.