Tactile Object Property Recognition with Geodesical Spatial Graph Edge Features and Multi-Thread Graph Convolutional Network

Shardul Kulkarni, Satoshi Funabashi,

Alexander Schmitz, Tetsuya Ogata, Shigeki Sugano

Waseda University

IEEE Conference on Robotics and Automation (and IROS) 2024

Abstract

Performing dexterous tasks with a multi fingered robotic hand remains challenging. Tactile sensors provide touch states and object features for multifingered tasks, yet the variety in shapes, sizes, textures, deformabilities and masses of everyday objects makes the task conditions diverse. Despite these challenges, humans accomplish these difficult tasks by producing a sensory motor representation of their body. This combined tactile and proprioceptive representation enables humans to accommodate the diversity in daily objects. Referring to this concept, this paper presents a method for object property recognition using Graph Convolutional Networks (GCNs), leveraging robot hand proprioception and morphology with spatial embeddings derived from geometrical graph edge features acquired from real tactile sensor alignments on an Allegro Hand. Additionally, a Multi Thread GCN (MT GCN) architecture is introduced to process these edge features and basically multi modal data in a graph. Training data was acquired using a data glove, from tri axial tactile sensors distributed across the fingertips, finger phalanges, and palm of an Allegro Hand, producing a total of 1152 tactile measurements. MT GCN with proposed edge features, tactile features and joint angles achieved a high recognition rate, 86.08% for six classes of object property combinations from eight objects. The effect of variation in graph adjacency on MT GCN was examined. The proposed network showed clusters following the robot hand configuration with t SNE analysis. Furthermore, analysis of learned parameters in the edge feature encoder demonstrated its ability to discern joint positions on the hand, acquiring proprioceptive features effectively. Consequently, we could confirm that the proposed method was effective for multi fingered dexterous tasks.

Paper

IEEE Xplore: 10.1109/LRA.2024.3367271

ResearchGate: article

Accepted at IROS2024 & RA-Letter

@ARTICLE{10439608,

author={Kulkarni, Shardul and Funabashi, Satoshi and Schmitz, Alexander and Ogata, Tetsuya and Sugano, Shigeki},

journal={IEEE Robotics and Automation Letters},

title={Tactile Object Property Recognition Using Geometrical Graph Edge Features and Multi-Thread Graph Convolutional Network},

year={2024},

volume={9},

number={4},

pages={3894-3901},

keywords={Task analysis;Tactile sensors;Convolutional neural networks;Feature extraction;Propioception;Object recognition;Morphology;Deep learning;feature detection;graph neural networks;representation learning;robot sensing systems;tactile sensors},

doi={10.1109/LRA.2024.3367271}}

Overview Video

Method

We used a Graph Convolutional Neural Network to learn the tactile states. The graph is made of distributed tactile sensors, each taxel being a node with features being the sensor values in X, Y and Z directions, respectively. The node features are the tactile sensor values in X, Y and Z directions, whereas the edge features carry spatial information of the hand, tracing positions of nodes (taxels) relative to their neighbourhood.

The adjacency matrix A consists of 1's when two nodes are connected (edge is present) and 0's when two nodes are not connected (edge is absent). Replacing the 1's by edge feature vector renders matrix multiplication impossible. Hence, to address the problem posed by the edge features, we propose a method to map the edge features into node feature space. Edge features are mapped into node feature space using a single-layer perceptron, with an input dimension of (number of edges * 4) and an output dimension of (number of nodes * 4). This perceptron layer is referred to as the 'edge feature encoder'. The output of the edge feature encoder can be used for Graph Convolutions, as shown in the figure below.

Instead of concatenation of node features of different modalities, to pass it through as single Graph Convolutional layer thread, we propose a method of simultaneous Graph Convolutions on features of different modalities and different feature spaces associated with the same graph.

In the proposed MT-GCN architecture, we process each node feature modality of the graph nodes in parallel independent threads of Graph Convolutional layers as shown in the figure below. After the final Graph Convolutional layers of the respective threads of each modality, the features are fused for further utilization.

Results & Analysis

For the experimental setup of this study, we used the Allegro Hand with uSkin distributed tactile sensors uSkin's sampling rate is 100 Hz. The number of measurements on the Allegro Hand adds up to : 16 (4 fingers * 4 joint angles) + 1152 ((4 fingertips * 24 uSkin sensor chips) + (11 finger phalanges + 7 uSkin sensors on a palm) * 16 uSkin sensor chips) * 3 axes = 1168 (Fig. \ref{sensorfeaturemap}). Each object was put in a random pose below the Allegro Hand and the data was collected from a motion of picking up and holding eight such objects from a desk by remotely controlling the Allegro Hand using a CyberGlove (22-sensor model). For each object, 10 successful trials of the target motion were conducted, each for about 17 seconds. Thus, with eight objects, a total of 80 trials of training data was collected. Each recorded trial was pre-processed before using as training data for the GCN. 70% of the data collected from each object was used for training, and 30% for testing. One third of the test dataset (10% overall) was used as the validation set. For a given task of object property recognition, although the input comprises time-series data, the output, namely, the object property label, is time independent. Therefore, the focus of inquiry in this work is spatial processing. We selected six object properties commonly encountered in everyday objects, based on heaviness, hardness, and slipperiness. Eight common daily objects were chosen. We prepared six labels in the following sequence: light, heavy, hard, soft, non-slippery, and slippery, and assigned a value of 1 to each label when the object possessed that respective property; otherwise, it was assigned a value of 0.

The table shows the recognition rates of each model we trained as the part of experimental setting. Among the GCNs, the lowest recognition rate is produced by the GCN with only absolute morphological node features without joint angles. The recognition rate of GCN with only tactile input without joint angles is also quite low. However, both of the recognition rates increase significantly with joint angle input. We can see this pattern in case of both GCN and MT-GCN models as well. This indicates that absolute morphological node features are not sufficient on their own for extracting proprioceptive features. The best model is the MT-GCN (86.08%) with tactile and edge features. Given that it is superior to MT-GCN with a combination of tactile and absolute morphological node features indicates that edge features are more effective than absolute morphological node features in case of the multi-thread architecture. From the result, it can be seen that the proposed MT-GCN has a better tendency of feature extraction from the inputs compared to other models.

We visualized the weights of the edge feature encoder of MT-GCN with tactile and edge features input combination. We further used principle component analysis to reduce the dimensions associated with each edge to 1, thus obtained a vector of dimension equal to number of edge features. We then produced a colour-map out of this vector. We can see a clear pattern with the values of the learned parameters, where the edges between the joints have similar values marked by redder colours while edges within patches, and edges between patches on the palm are mostly bluer. The edges within fingertips are warmer in colour than the rest of the intra-patch edges because fingertips, being the last links of the fingers with only one joint at their base, experience maximal movement from the hand's origin, being farthest from it. The figure shows that edge feature encoder can recognize the positions of the joints without the input of joint angles, which implies that edge features can convey the information of proprioception to the model. Identical analysis of the learned parameters of the edge feature encoder in MT-GCN with joint angle input reveals that since the joint angles are provided, identifying joint positions is sufficient for the edge feature encoder.

We evaluated recognition rate of the MT-GCN with four adjacencies of different number of edges and hence different edge densities. With increase in edge density, the model receives more edge features. The table shows the recognition rates of the MT-GCN model with tactile and edge features and joint angle input for each adjacency. Until the 'original' adjacency, as the number of edges in the graph, i.e. the edge density increases, the recognition rate of the network increases due to increase in the spatial features provided to the model through the edges. In previous works of Graph Neural Networks with tactile features, an opposite result has been obtained. In the mentioned works, the performance of the model almost always decreased with increase in the edge-density. Hence, this result might indicate that with presence of edge features in a GCN, increase in edge density in adjacency matrix may lead to increase performance from the network. However, a very small dip in the recognition rate in case of 'Dense' adjacency is observed, although its recognition rate is still higher than 'Sparse' and 'Slanted' adjacency. 'Dense' adjacency has more than double number of edges than the original adjacency, and this might have led to a degree of saturation in the graph.

The figure shows the t-SNE plot of features generated from the output of the final layer of the tactile thread of MT-GCN with tactile features, edge features and joint angles. Three different views of the same figure are presented. The red arrow in the figure serves as the reference to determine the viewing angle of the plot. The adjacent small figure of hand plays the role of 'legend' for each colour and shape corresponding to each sensor patch on the Allegro Hand. The plot shows clear clusters formed from features extracted by the proposed model. Each cluster represents a part of the hand, namely the digits and the palm. The clusters follow not only the exact sequence of the digits of the Allegro Hand, but also the sequence of the sensor patches within the digits. We can see from this result, that the proposed Graph Convolutional Network can effectively extract the morphological features of the hand with tactile information from distributed tactile sensors.

Acknowledgements

This work was supported in part by Japan Science and Technology Agency, ACT-I Information and Future Acceleration Phase under Grant JPMJPR18UP and in part by Moonshot R&D under Grant JPMJMS2031.

Page updated

Google Sites

Report abuse