A Tactile Sensor Prediction Model
This project focuses on reconstructing 3D surface geometry and contact regions from tactile sensor images. Using GelSight/DIGIT tactile data, the system learns to predict contact masks and depth (heightmaps) from raw tactile images using deep neural networks.
The goal was to build an inverse sensor model that translates tactile visual input into meaningful physical interaction data — useful for robotics, grasping, and object manipulation.
Tactile sensors provide rich physical feedback about surface geometry, contact pressure, and material interaction. While many recent approaches focus on measuring tactile deformation, this project emphasizes predicting contact and depth directly from sensor images.
This work draws inspiration from:
GelSight tactile sensing research
DIGIT tactile sensor design
Multi-scale depth prediction networks
The figure below illustrates the resolution of tactile sensors when they come into contact with various objects.
Train a supervised deep learning model to:
Predict binary contact regions (where physical contact occurs)
Predict dense depth maps (local height geometry)
Learn robust representations from tactile sensor images
A two-stage network architecture was implemented:
A Coarse Contact Network
A Fine Depth Network conditioned on contact output
A custom PyTorch Dataset was built to load tactile images and paired depth data. Images were resized, normalized, and converted into tensors for training.
Key preprocessing steps:
File index sorting for alignment
RGB tactile image loading
Depth placeholder handling
Transform pipelines for train/test
Batch loading via PyTorch DataLoader
Python:
class TactileDataset(Dataset):
def __getitem__(self, idx):
tactile_sample = Image.open(path).convert('RGB')
tactile_sample = self.transform(tactile_sample)
return {'tactile': tactile_sample}
A dual-network design was implemented:
ContactNet (Coarse Network): Predicts a binary contact mask using stacked convolutional layers.
TactileDepthNet (Fine Network): Predicts a dense depth map, conditioned on both tactile features and contact predictions.
Design highlights:
Multi-layer CNN feature extraction
Kaiming weight initialization
Bilinear upsampling to full resolution
Feature fusion between networks
Python:
contact_model_output = contact_model(tactile_input)
depth_output = tactile_depth_model(tactile_input, contact_model_output)
Both networks were trained using Mean Squared Error (MSE) loss and optimized with Adam.
Adam optimization is a training algorithm that automatically adjusts how much the model’s weights change during learning. It combines the benefits of momentum (to move faster in the right direction) and adaptive learning rates (to make smarter step sizes), which helps the model train more efficiently, stably, and quickly than basic gradient descent.
Key setup:
Separate loss tracking for contact & depth
GPU acceleration when available
DataParallel for scalable training
The contact model was trained to detect contact regions from tactile inputs.
Observations:
Loss stabilized after early iterations
Model learned spatially consistent contact patterns
The depth model leveraged contact predictions to generate refined heightmaps.
Observations:
Training converged steadily
Output depth maps showed improved spatial smoothness
Conditioning on contact improved depth stability
The trained models were evaluated on validation tactile images.
Outputs included:
Raw tactile image
Predicted contact mask
Predicted depth map
Depth outputs were normalized and exported for visualization.
Python:
contact_output = contact_model(tactile_input)
depth_output = tactile_depth_model(tactile_input, contact_output)
A reusable prediction function was implemented to load trained weights and generate contact + depth outputs for new tactile inputs.
Python:
def predict(tactile_image):
contact_output = contact_model(tactile_image)
depth_output = tactile_depth_model(tactile_image, contact_output)
return contact_output, depth_output
This project was completed as part of a team research assignment, with a full LaTeX technical report documenting architecture, training methodology, and evaluation metrics.
This project demonstrates:
Multi-stage deep learning pipeline design
Sensor-driven perception modeling
Feature fusion between coarse & fine networks
Depth estimation from tactile imagery
Practical robotics-oriented ML applications
Contact prediction improves downstream depth accuracy
Multi-scale CNNs help capture both global structure and fine detail
Tactile sensing can serve as a powerful alternative to vision-only depth estimation
Python
PyTorch
OpenCV
NumPy
Matplotlib
Jupyter Notebook
Deep Learning
Robotics Perception
Tactile Sensing