Zeqing Zhang, Guanqi Chen, Wentao Chen, Ruixing Jia, Guanhua Chen, Liangjun Zhang, Jia Pan, Peng Zhou
---------+-----------+---------
A multimodal classifier with force-text pair from robot-GM interaction.
Data collection in particles through numerous probe-level parameters.
10 types of partilces used in this work
Granular materials (GMs) are formed by a collection of particles. Even if its visual representation is straightforward, it can be seriously affected in the visually constrained environment. Based on frequency features observed in force signals, this paper proposes a non-visual classifier, GmClass, using the force feedback in the robot-granules interaction. Specifically, we transform the force sequences into the frequency domain and integrate them with high-dimensional textual information into a two-branch architecture for multimodal supervised contrastive learning (MSCL). This method achieves an 84.10% classification accuracy, surpassing traditional su- pervised learning by 10% and outperforming supervised con- trastive learning by more than 40%, demonstrating the positive impact of adding text modality on classification, and when applied to a larger dataset, it attains an even higher 85.28% accuracy, further validating its effectiveness. Also, we demonstrate the performance of our approach in handling unseen particles and the generalization capability for varying data collection parameters.
In daily life, the trained operator can distinguish particles solely through haptic feedback from contact with them. We aim to endow robots with this capability.
➡️The enlarging random errors (e.g., in yellow circles) can be observed in the force sequence with particle size rising.
So, we further analyze the collected force feedback by converting them into frequency spectra using the Fast Fourier Transform (FFT).
GM10-ts, which consists of time-series force signals captured by the robotic manipulation in 10 commonly-found GMs in daily life.
It comprises 5000 data points, i.e., 500 instances for each granule.
Prescribed probe-level parameters: mv=0.4, penetration depth 6cm, probe diameter 1cm.
in crushed peanut sampled force data
in cat litter sampled force data
We demonstrate the inference of the trained GmClass on GM10-ts.
During inference, GmClass takes a segment of time-series force signal, and text prompts of 10 GMs as input.
Then GmClass maps them to a joint embedding space through frequency and text encoders. The embedding space is a numerical representation where the similarity between frequency and text can be measured.
By comparing the similarities in this joint space, GmClass can predict the most relevant class labels for the given force signal.
Truth: refined salt
Prediction: refined salt ✅
Truth: baysalt
Prediction: baysalt ✅
Truth: gravel
Prediction: gravel ✅
Truth: sand
Prediction: sand ✅
Truth: cassia seed
Prediction: cassia seed ✅
Truth: in-shell peanut
Prediction: in-shell peanut ✅
The additional data constitute the large-sized dataset GM10-ts-Plus, whose structure is given on the left.
It includes 10 GMs, i.e., baysalt, broad bean, cassia seed, cat litter, crushed peanut, gravel, in-shell peanut, long-grain rice, refined salt, and sand.
In each GM, there are 27 sets of probe-level factors regarding probe diameter (a), motion velocity (v), and penetration depth (d), where a ={0.7, 1, 1.5} (unit: cm), v ={0.3, 0.4, 0.5} and d ={5, 6, 7} (unit: cm).
In each (a, v, d) set, there are 100 random raking experiments, resulting in 100 CSV files.
There are 27000 data points collected in this dataset.
It gives 85.28% classification accuracy when we train a model on the GM10-ts-Plus, which is slightly higher than that of the model GmClass on the limited dataset GM10-ts (84.10%).
We conducted generalization experiments for a, v, and d, respectively, as displayed in Table A2.
For example, “GmClass-D-E” and “GmClass-D-I” refer to the extrapolation and interpolation tests w.r.t. the penetration depth “d”.
For “GmClass-D-E”, it is trained on the data from depths of 5 cm and 6 cm, then tested on the data from a depth of 7 cm, i.e., extrapolating from 5 and 6 cm to 7 cm.
For “GmClass-D-I”, it is trained on the data from depths of 5 cm and 7 cm, and is tested on the data from a depth of 6 cm, i.e., interpolating from 5 and 7 cm to the middle value 6 cm.
From Table V, we can find that:
Parameter d: Extrapolation accuracy is 77.00% and interpolation is 76.88%. The close values suggest the model performs consistently for data within and outside the training range of parameter d, indicating a good understanding of its underlying relationship.
Parameter a: Interpolation accuracy is 76.06% and extrapolation is 65.28%. The model performs better for in-range data. The lower extrapolation accuracy implies it struggles with out-of-range values, likely due to insufficient pattern learning.
Parameter v: Interpolation accuracy is 60.94% and extrapolation is 56.28%. The model has more difficulty generalizing for parameter v, both in and out of range. This indicates the motion velocity has the most impact on the F-T data collection, further reducing the classification accuracy.
< The probe raking in in-shell peanuts along the straight line fails.
< This is reflected, to some extent, that the applicability of spiral trajectory is higher than that of linear trajectory.
Probably because of the similar granular properties between cassia seed and long-grain rice. >
Truth: long-grain rice, Prediction: cassia seed ❌
In this work, we present a particle classifier by utilizing force feedback from GM-probe interactions.
Our model outperforms traditional methods and incorporates a dual-branch architecture with frequency signals and high-dimensional semantic information for improved differentiation.
In the future, a larger dataset containing more probe-level parameters, GM types, and the design of the optimal raking trajectory will be promising.