Heterogeneous systems manipulation, i.e., manipulating rigid objects via deformable objects, is an emerging field that remains in its early stages of research. Existing works in this field suffer from limited action and operational space, poor generalization ability, and expensive development. To address these challenges, we propose a universally applicable and effective moving primitive, Iterative Grasp-Pull (IGP), and a sample-based framework, DeRi-IGP, to solve the heterogeneous system manipulation task. The DeRi-IGP framework uses local onboard robots' RGBD sensors to observe the environment, comprising a soft-rigid body system. It then uses this information to iteratively grasp and pull a deformable linear object (e.g., rope) to move the attached rigid body to a desired location. We evaluate the effectiveness of our framework in solving various heterogeneous manipulation tasks and compare its performance with several state-of-the-art baselines. The result shows that DeRi-IGP outperforms other methods by a significant margin. We also evaluate the sim-to-real generalization of our framework through real-world human-robot collaborative goal-reaching and distant object acquisition tasks. Our framework successfully transfers to the real world and demonstrates the advantage of the large operational space of the IGP primitive.
At each step, the subgoal planner generates local target positions for each agent. Then, GPN takes as input the state observation to predict a grasping point. Next, PPN predicts a pulling point conditioned on the grasping point and the environment state. The sampler generates more pulling points around the output of PPN. Given the grasping point and the pulling points, the DPN model predicts the corresponding future position of the rigid object of all the actions. The framework picks the best action that yields the minimum distance between the object and the target position. This figure shows such a process for the robot on the bottom. However, by design, all the robots’ proposed actions will be compared together for the selection of the robot and its best action. Note that we applied the segmentation map as a mask to the GPN output for visualization. Besides, the map of the subgoal planner presented in the left-most sub-figure is also applied as a mask to the spatial relative position map and the output of DPN.