TacRefineNet: Goal-Conditioned Tactile Grasp Refinement for Edge-Prominent Objects

Anonymous author(s)

Website

Video

Paper

Dataset

Code

Abstract

Despite progress in both traditional dexterous grasping pipelines and recent Vision-Language-Action (VLA) approaches, the grasp execution stage remains prone to pose inaccuracies, especially in long-horizon tasks, which undermines overall performance. To address this “last-mile” challenge, we propose TacRefineNet, a tactile-only framework that achieves fine in-hand pose refinement of known objects in arbitrary target poses using multi-finger fingertip sensing. Our method iteratively adjusts the end-effector pose based on tactile feedback, aligning the object to the desired configuration. We design a multi-branch policy network that fuses tactile inputs from multiple fingers along with proprioception to predict precise control updates. To train this policy, we combine large-scale simulated data from a physics-based tactile model in MuJoCo with real-world data collected from a physical system. Comparative experiments show that pretraining on simulated data and fine-tuning with a small amount of real data significantly improves performance over simulation-only training. Extensive real-world experiments validate the method’s effectiveness, achieving millimeter-level grasp accuracy using only tactile input. To our knowledge, this is the first method to enable arbitrary in-hand pose refinement via multi-finger tactile sensing alone.

Tactile Sensor and Its Simulation

We utilize a piezoresistive tactile sensor array integrated into an 11-DoF dexterous hand. Each fingertip sensor consists of an 11 * 9 taxel grid, where each taxel measures normal contact force. The physical spacing between the taxels on the real tactile fingertip is approximately 1.1 mm. The raw taxel outputs are transformed into tactile images, enabling the use of vision-based encoders for feature extraction.

Simulation Results

tacrefinenet_sim_result-1.mp4

Performance Between Arbitrary In-Hand Object Poses

We conduct experiments under diverse initial and target in-hand poses. The poses are parameterized along 4 dimensions: pitch, roll, y, and z. For each trial, we randomly select one dimension and set the initial pose, while the target pose is chosen to be its symmetric counterpart. This results in 16 distinct pose pairs.

peraxis.mp4

tacrefinenet_real_result-1.mp4

Long-horizon Tracking

To evaluate the robustness of our method in dynamic scenarios, we conduct a long-horizon object tracking experiment. A fixed tactile image is provided as the target, while the object’s pose and position are continuously perturbed throughout the sequence. The goal is to assess whether our system can consistently adjust to maintain the desired grasp. The results demonstrate that our method can reliably perform fine-grained grasping toward a specified target pose, even under continuous variations in object pose.

tacrefinenet_long-1.mp4

longhorizon.mp4

Generalization to Unseen Objects

tacrefinenet_unseen-1.mp4

Page updated

Google Sites

Report abuse