Tool as Embodiment for Recursive Manipulation

Summary

Approach

The architecture we use is similar to that of Transporter or AdaGrasp, but the output is conditioned on the scene and an end effector representation that can capture both grippers and tools.

end effector representation

Experiments

We created 4 tasks spanning both hand-object interaction and tool-object interaction:

Grasp

Push

Grasp → Grasp

Grasp → Push

The final dataset used to compare models is an accumulation over several iterations of data collection and model training; this can be used to compare other methods in future work. The simulation environment can also be used to collect new data.

Results

Results indicate that the learned model can beat baselines and that a model trained on both hand-object interaction and tool-object interaction can improve performance over one trained separately. However, there is still room for improvement and testing in more complex settings.

Evaluation videos

grasp.mp4

Grasp

push.mp4

Push

grasp2.mp4

Grasp → Grasp

push2.mp4

Grasp → Push