Tool as Embodiment for Recursive Manipulation

Summary

Hand-object interaction and tool-object interaction can have similar dynamics but prior work often deals with them using separate approaches
We propose one way to unify them and train a single neural network policy to do both (e.g. pick up tools and use them to pick up an object)
Results show models trained on both types of interactions may be beneficial for generalization
Tool use may be one path to train robot policies that are robust to changes in embodiment

Approach

The architecture we use is similar to that of Transporter or AdaGrasp, but the output is conditioned on the scene and an end effector representation that can capture both grippers and tools.

end effector representation

Experiments

We created 4 tasks spanning both hand-object interaction and tool-object interaction:

Grasp

Push

Grasp → Grasp

Grasp → Push

The final dataset used to compare models is an accumulation over several iterations of data collection and model training; this can be used to compare other methods in future work. The simulation environment can also be used to collect new data.

Results

Results indicate that the learned model can beat baselines and that a model trained on both hand-object interaction and tool-object interaction can improve performance over one trained separately. However, there is still room for improvement and testing in more complex settings.

Evaluation videos

grasp.mp4

Grasp

push.mp4

Push

grasp2.mp4

Grasp → Grasp

push2.mp4

Grasp → Push