Multi-Fingered In-Hand Manipulation
With Various Object Properties Using Graph Convolutional Networks and Distributed Tactile Sensors
Satoshi Funabashi, Tomoki Isobe, Fei Hongyi, Atsumu Hiramoto,
Alexander Schmitz, Shigeki Sugano, Tetsuya Ogata
Waseda University
IEEE Conference on Robotics and Automation (and ICRA) 2022
Abstract
Multi-fingered hands could be used to achieve many dexterous manipulation tasks, similarly to humans, and tactile sensing could enhance the manipulation stability for a variety of objects. However, tactile sensors on multi-fingered hands have a variety of sizes and shapes. Convolutional neural networks (CNN) can be useful for processing tactile information, but the information from multi-fingered hands needs an arbitrary pre-processing, as CNNs require a rectangularly shaped input, which may lead to unstable results. Therefore, how to process such complex shaped tactile information and utilize it for achieving manipulation skills is still an open issue. This paper presents a control method based on a graph convolutional network (GCN) which extracts geodesical features from the tactile data with complicated sensor alignments. Moreover, object property labels are rovided to the GCN to adjust in-hand manipulation motions. Distributed tri-axial tactile sensors are mounted on the fingertips, finger phalanges and palm of an Allegro hand, resulting in 1152 tactile measurements. Training data is collected with a data-glove to transfer human dexterous manipulation directly to the robot hand. The GCN achieved high success rates for in-hand manipulation. We also confirmed that fragile objects were deformed less when correct object labels were provided to the GCN. When visualizing the activation of the CN with a PCA, we verified that the network acquired geodesical features. Our method achieved stable manipulation even when an experimenter pulled a grasped object and for untrained objects.
Paper
IEEE Xplore: 10.1109/LRA.2022.3142417
arXiv: http://arxiv.org/abs/2205.04169
Accepted at ICRA2022 & RA-Letter
ICRA Poster
Bibtex
@ARTICLE{9681179,
author={Funabashi, Satoshi and Isobe, Tomoki and Hongyi,
Fei and Hiramoto, Atsumu and Schmitz, Alexander and
Sugano, Shigeki and Ogata, Tetsuya},
journal={IEEE Robotics and Automation Letters}, title={Multi-
Fingered In-Hand Manipulation With Various Object Properties
Using Graph Convolutional Networks and Distributed Tactile
Sensors},
year={2022},
volume={7},
number={2},
pages={2102-2109},
doi={10.1109/LRA.2022.3142417}}
Overview Video
Method
A convolutional neural network (CNN) is a popular network used for processing tactile information spatially. Since the Allegro Hand manipulates objects using multiple fingers and a palm in this study, the CNN needs to receive tactile information from those parts of the hand. However, it is difficult to combine the tactile information of all fingers and the palm into a single map. This is a crucial problem of a CNN and thus a CNN is not used in this study. Therefore, a GCN was introduced as an alternative network to CNN, which could still consider tactile information spatially. Recurrent neural networks including long short term memory (LSTM) were not used as this study focused on geometric or geodesical tactile information even though the networks are useful for tasks with time-series information including multi-fingered manipulation.
Each sensor point of the uSkin as a node is connected by an edge as a graph structure. By constructing the information of each node and edge together, the grasping state of an object on the entire hand can be learned spatially. In addition, by introducing GCN into the control method of the hand, we can stabilize the results because there is no need to transform the tactile sensor map to the rectangular shape that is required by CNNs. Furthermore, it is possible to input all sensors together without losing sensor positional information even in complex sensor arrangements.
Finally, tactile features are acquired and input to a fully-connected layer with other sensor information and the object property labels corresponding to manipulation with a target object.
Results & Analysis
We conducted a comparison experiment by changing the number of graph convolution layers. In addition, a multi-layer perceptron (MLP) was prepared as Model IV . The definition of success is whether the distance between the palm of the Allegro Hand and the target object is under 2 cm and whether the orientation of the object to the palm is under 15 degrees at the final grasping posture. Model I could achieve the highest success rate, 5 out of 5 times. Importantly, the manipulation was conducted with each finger cooperating with each other.
There was a large difference in success rates of in-hand manipulation among the networks. In order to investigate the factors behind the successful manipulation, a principal component analysis (PCA) was used and the tactile features obtained from the last graph convolution layer were studied for Model I.
Clusters emerge for each finger and each part of the finger, i.e. the fingertips, the finger phalanges, and the palm. The cluster of the palm was located below all fingers. The spatial and functional relationship between the fingers is extracted by the features. Moreover, the tactile information seems to affect the clusters. The thumb and middle finger are widely distributed. Note that those clusters did not emerge before training, unlike to the original paper of the GCN which mentioned that a GCN before training already produced clusters. The feature map with clear clusters was acquired only from the last graph convolution layer. Overall, learning manipulation motions was meaningful for acquiring geodesical or robotic features with a sufficient number of convolutions.
We performed manipulation by changing the property labels for in-hand manipulation. In this comparison experiment, the property labels for a target object were specified and input to the GCN (Model I) with a soft plastic tube as the target object. Two sets of property labels were used as input: correct labels and incorrect labels. The correct label consists of light, soft, and slippery, which are the properties of the object, while the wrong label consists of heavy, hard, and slippery which is actually for saran wrap. The GCN (Model I) trained with no labels was also prepared for this study. However, it never succeeded in performing the manipulation and produced different motions in every trial. ((c) shows one example). The cross-section of the object in the final grasping posture when the correct property label (shown in(a)) and the incorrect property label (shown in (b)) are used. When the correct property label was used, the soft plastic cylinder was not squeezed, and when the wrong property label was used, the soft plastic cylinder was crushed.
From this result, when the correct label was used, the hand could perform the manipulation with an appropriate grasping force, and therefore, did not crush the object. It was proven that one GCN learns many multi-fingered manipulation motions which are variable due to object properties at the same time and the property labels could adjust the motions.
Demonstration Videos of Robustness Test on Untrained Situations
Manipulation with Untrained Objects
Manipulation under Disturbance
Acknowledgements
This work was supported in part by Japan Science and Technology Agency, ACT-I Information and Future Acceleration Phase under Grant JPMJPR18UP and in part by Moonshot R&D under Grant JPMJMS2031.