High-resolution multimodal tactile perception is one of the challenges in the field of robotics research. In this article, we propose a novel tactile sensor that enables the detection of optical information in the frequency bands from visible, near-infrared to mid-infrared and can simultaneously realize high-resolution sensing of temperature, texture, deformation, force, and proximity based on visuotactile sensing technology. To achieve those functions, we not only design a low-cost, small-size multi-spectral visual imaging system but also create an elastic film whose permeability is regulated by the brightness of the light. To realize pixel-level force sensing, we propose a 3D force sensing method combined with finite element analysis, which can achieve a 3D force information sensing accuracy of 0.023 N. Then, we also study the depth reconstruction algorithm for elastic surfaces, the super-resolution algorithm for temperature information, the viscosity classification algorithm, the proximity perception algorithm, and the multimodal information fusion algorithm. Finally, we conduct liquid classification experiments, fragile and ultralight object grasping experiments, circuit board failure detection experiments, underwater hot spot detection experiments, etc., which verify that our research has a very broad range of applications.
Applications of different wavelengths of light. Visuotactile sensors combine UV and visible light: UVtac; Visuotactile sensors in the visible light band: Gelsight & Digit; Visuotactile sensors combine visible light and near-infrared: Tac.
Light contains a wealth of physical information. As shown in the left figure, the wavelength can be divided into X-rays, ultraviolet light, visible light, infrared light, etc. Different wavelengths of light have different physical properties, which can be applied in different fields. For instance, X-rays have strong penetrating properties and are used in industrial flaw detection and medical testing, ultraviolet can be used for sterilization, and infrared is used in night-vision monitoring equipment or temperature detection. But so far, the visuotactile sensors are still mainly focused on the visible light band of information. Researchers have also tried to combine ultraviolet light and visible light to reduce the impact of the marker on the detection of the object's contour, which makes up for some of the shortcomings of visible light imaging. Some researchers have also attempted to combine near-infrared with visible light for proximity and contact perception. Hence, if more wavelengths of light can be fused, it may greatly extend the sensing capability of visuotactile sensors.
A visuotactile sensor usually consists of a sensing skin, a lighting system, and a vision system. In this paper, we design a high-resolution visuotactile sensor that can acquire contact force information, texture information, proximity information, and temperature information simultaneously, as shown in the following figure. (a)(b)(c). To achieve this, we optimize the design of the sensor's vision system, lighting system, and sensing skin, respectively.
The internal structure is shown in the following figure. (d), to realize the detection of many different wavelengths of light, we design a vision system that can detect visible, near-infrared, and mid-infrared light simultaneously. The visible light camera is used for proximity sensing, the near-infrared camera is used to sense the shape detection and texture detection of the skin, and the mid-infrared vision system is used for temperature detection. To realize proximity sensing, we design a film with special optical properties, which becomes opaque on the side of strong light and transparent on the side of weak light. By adjusting the brightness of different wavelengths of light on both sides of the elastic film, we can realize selective transmission of light. To control the light inside the sensor, we designed a fully enclosed sensor housing, in which the visible light from the outside is brighter than that from the inside, so the elastomer film is transparent to visible light. Proximity sensing is achieved when the sensing skin is in a transparent state, and the shape of the sensing skin is achieved when the sensing skin is in an opaque state. To realize the detection of the shape of the sensing skin, we constructed a near-infrared light field inside, in which case the infrared light inside is larger than that outside, and thus opaque to infrared light.
Compared to data-level force sensing, pixel-level force estimation has a higher resolution but also requires a larger amount of data. To realize pixel-level force sensing, we propose a pixel-level automated annotation platform, which on the one hand can realize force sensing at contact position using industrial force sensors (ATI Gamma force sensor), and on the other hand, can also realize pixel-level contact area segmentation with the help of multispectral imaging. In addition, to obtain the force distribution of each pixel at the contact location, we establish a finite element analysis model of the inflatable elastomer and propose a pixel-level force sensing network based on swing-transforms (FSwint-MAP), which can achieve a force sensing accuracy of 0.023 N.
Pixel-level automated annotation platform. (a) Images captured by near-infrared cameras. (b) Images captured by visible light cameras. (c) The image is thresholded from the image captured by the visible light camera.
Finite element analysis. (a) The initial state of finite element analysis. (b) Finite element analysis when contact occurs without sealing. (c) Finite element analysis of contact in the case of inflation of inflatable bumps. (d) Finite element analysis of a 7.5 mm diameter and 1.5 mm height cylinder in contact with an inflatable film. (e) (f) (g) Distribution of forces in the X, Y, and Z directions. (h) The mask of the contact position. (i)(j)(h) Distribution of forces in the X, Y, and Z directions after combination with the mask.
Force sensing algorithm.
Although robots can obtain stable and reliable contact force information through contact sensing, proximity sensing is also important when performing the grasping of some fragile objects. To obtain the distance between the object and the sensor, we propose a proximity perception method, as shown in the following figure.
3D reconstruction is one of the most important functions of visuotactile sensors, which reflects the characteristics of large areas and the high resolution of visuotactile perception. The current 3D reconstruction technology mainly uses the photometric stereo method, which calculates the normals based on the brightness of different colored rays on the sensor surface, as shown in the following figure.
The high resolution of the mid-infrared temperature measurement device is costly, the resolution of 100 × 100 above the price of the device is often more than 200 $, the expensive price limits the promotion of the sensor, and the higher the resolution of its volume tends to be larger, but also not conducive to the deployment of tactile sensors inside. To achieve low cost and miniaturization, we used MLX90640 as the temperature sensing unit, this sensor can achieve a resolution of 24 × 32 and a field of view of 155 °C. In addition, to achieve higher-resolution temperature sensing, we propose a temperature data calibration platform as well as lightweight super-resolution algorithms for mid-infrared temperature images.
In a home or laboratory scenario, we often need to manipulate containers with different textures, shapes, and temperatures, and sometimes the container's excessively high temperature or cold temperature can easily damage human skin. The use of MTac not only allows us to obtain information about the temperature of the object at the time of contact but also allows us to realize the classification of containers during the grasping process. To realize this function we designed a classification network, where temperature and texture images are concatenated together using GoogleNet, the architecture of this network is shown in the following figure.
In addition to temperature, viscosity is also a common property of objects, and humans usually judge the viscosity of objects based on the process of human fingers contacting and separating from objects. Inspired by this process, we record the contact and separation process of MTac with the object can realize the viscosity classification of the object, by using VGGNet as a feature encoder and two-layer LSTM to capture temporal relationships, the network framework is shown in the following figure. Considering the limit of compute resources, TimeWrapper is added to balance the usage of GPU memory and inference time.
MTac can not only realize the functions of force sensing, texture sensing, and 3D reconstruction that traditional visuotactile sensors have but also realize proximity sensing and high-resolution & large-range temperature sensing. These functions have important application value in fragile object manipulation, home service, etc.
In this paper, we combine multispectral imaging with unidirectional perspective latex film to propose a multifunctional visuotactile sensor that can realize force, deformation, texture, proximity, temperature, and viscosity sensing. Firstly, to obtain pixel-level force sensing information, we build an automated pixel-level data annotation system and utilize finite element analysis and SFwin-MAP network to estimate the contact force information of each pixel point, which achieves a detection accuracy of 0.032N. Secondly, we propose a 3D reconstruction method based on luminance information, which not only realizes depth reconstruction but also information extraction from the contact area. Next, to realize high-resolution temperature sensing, we build a super-resolution temperature information acquisition system and a lightweight super-resolution network, which can not only get 172 × 172 high-resolution temperature information, the sensing accuracy can reach 0.3 °C, and the sensing range can be up to 0 ~ 100 °C ( the sensor can realize direct temperature sensing from -20 to 130 °C). The temperature response speed can reach 54 °C/s. In addition, we also propose a multimodal classification algorithm and a viscosity classification method. Finally, to verify the application value of the sensors, we propose a fragile object grasping experiment, a circuit board heating position detection experiment, and an underwater pipeline heating position detection experiment. These tests serve to underscore the sensor's real-world applicability across various domains, including home service, industrial inspection, and underwater operations.
MTac is the first sensor that can simultaneously realize high-resolution proximity, deformation, temperature, and texture sensing. In some aspects, it can even surpass the sensing ability of human skin, which is of great significance in advancing the development of tactile sensors. MTac adopted a multispectral imaging technique, which provides new ideas for the development of visuotactile sensors. On the other hand, based on the MTac sensor we propose a complete algorithmic framework and automated data acquisition, which not only reduces the workload during sensor calibration but also plays a role in promoting the industrialization of visuotactile sensors. In the force sensing method, we explored the finite element analysis of the elastic inflatable film when contact occurs, which provides theoretical support for sensors adopting elastic inflatable films. In addition, the elastic film used in our sensing skin not only avoids the influence of acrylic glass on infrared temperature detection but also realizes viscosity classification based on the contact-separation process. The current problem with this sensor is its large size, but we believe that as imaging technology evolves, the sensor will get more and more compact.
Please see the paper for more details