Summary
Hand-object interaction and tool-object interaction can have similar dynamics but prior work often deals with them using separate approaches
We propose one way to unify them and train a single neural network policy to do both (e.g. pick up tools and use them to pick up an object)
Results show models trained on both types of interactions may be beneficial for generalization
Tool use may be one path to train robot policies that are robust to changes in embodiment
Approach
The architecture we use is similar to that of Transporter or AdaGrasp, but the output is conditioned on the scene and an end effector representation that can capture both grippers and tools.
end effector representation
Experiments
We created 4 tasks spanning both hand-object interaction and tool-object interaction:
Grasp
Push
Grasp → Grasp
Grasp → Push
The final dataset used to compare models is an accumulation over several iterations of data collection and model training; this can be used to compare other methods in future work. The simulation environment can also be used to collect new data.
Results
Results indicate that the learned model can beat baselines and that a model trained on both hand-object interaction and tool-object interaction can improve performance over one trained separately. However, there is still room for improvement and testing in more complex settings.
Evaluation videos
Grasp
Push
Grasp → Grasp
Grasp → Push