Placing by Touching

An empirical study on the importance of tactile sensing for precise object placing

Accepted at IROS 2023

Luca LachNiklas Funk,  Robert Haschke, Severin Lemaignan, Helge Joachim Ritter, Jan Peters, Georgia Chalvatzaki

bold = equal contribution

Paper | Code | Datasets | Models


This work deals with a practical everyday problem: stable object placement on flat surfaces starting from unknown initial poses

Common object-placing approaches require either complete scene specifications or extrinsic sensor measurements, e.g., cameras, that occasionally suffer from occlusions. We propose a novel approach for stable object placing that combines tactile feedback and proprioceptive sensing. We devise a neural architecture that estimates a rotation matrix, resulting in a corrective gripper movement that aligns the object with the placing surface for the subsequent object manipulation. 

We compare models with different sensing modalities, such as force-torque, an external motion capture system, and two classical baseline models in real-world object placing tasks with different objects. 

The experimental evaluation of our placing policies with a set of unseen everyday objects reveals significant generalization of our proposed pipeline, suggesting that tactile sensing plays a vital role in the intrinsic understanding of robotic dexterous object manipulation.

Stable Object Placement - Problem Definition

Frames and placing normal

We define the object's placing normal to lie along the z-axis in the so-called local object placing frame O' such that we can define a rotation matrix from a common world frame to O'. G denotes the gripper frame, in which we measure the tactile signals.

Corrective object motion

Given that we know, or estimate, the orientation of the object's placing normal w.r.t. the gripper frame, we can generate a corrective motion with the arm. When executing said motion, the object's placing face will align with the placing surface, enabling a stable object placing motion

Four Phases of Stable Object Placing 

In the first phase, given sensory information, we estimate the object's placing normal, which allows quantifying the misalignment with the placing surface. Based on the misalignment prediction, we compute and execute a corrective motion.

For the third phase, a placing motion is planned and executed that moves the object towards the placing surface linearly while maintaining the object's orientation.

Once the robot detects table-object contact in the tactile responses, it transitions to the fourth phase, where it opens up the gripper and retracts.

Neural Network-based Approach to estimate Object Misalignment

Network Architecture

To process the tactile data, we first use two convolutional layers with a 3x3 kernel each, and 16 and 32 output channels respectively.

The output of the last convolutional layer is then fed to a Multilayer Perceptron consisting of two hidden layers with 128 neurons each and ReLU activation functions, followed by a dropout layer with a dropout probability of p=0.2. F/T data can be optionally fed into the MLP as an additional input signal, which is concatenated with the tactile features.

Data Collection

We used two primitive 3D-printed objects for data collection, i.e.,  a cylinder (2.25cm radius,15cm length) and a cuboid (5x5x19cm) as shown on the left.

To collect the ground-truth orientation that is required to train our networks, we use OptiTrack, an external, infra-red camera-based marker tracking system. Thus, during data collection, markers were attached to both, robot and object.

Evalution on Everyday Household Objects


To evaluate our neural network, we test its performance on a wide variety of household objects. During this evaluation, we attached markers to the objects to validate the prediction accuracy (measured in radians) and note the success rate of several placing trials to assess the reliability in real-world applications. We do not report prediction accuracy for the lipstick as attaching a marker ensemble would substantially alter its placing dynamics and, thus, report solely the success rate.

We only evaluated the two best neural models from a prior experiment (more details in the paper), namely the tactile-only and the tactile with F/T neural nets, along with the two classical approaches. We evaluated the 4 methods on 7 different household objects for 20 trials each, hence performing 560 placing trials in total. Two of the objects were cylindrical (Glue Bottle & Pringles), three objects were box-like (Mallow Pop, Tabasco & Cheez-It), and the Lipstick was a small, elongated, rectangular object with rounded edges (see above figure).

The network models perform very well on most unknown objects, indicating that our method generalizes across object primitives of unknown dimensions. The PCA baseline showed similar performance to the neural networks on cylindrical objects. Its performance dropped sharply on box-like objects, which can likely be attributed to a less pronounced distribution of forces around the object's main axis. Lastly, the Hough model performed well to mediocre on all evaluation objects.

Short Supplementary Video


Extended Supplementary Video (5min)