Extenders for the X-Arm grippers were added so that the DenseTact Mini’s could be successfully mounted to the X-Arm without the additional use of motors or electronics. The mounts were fixed to be 60 degrees from horizontal so that the DenseTact Mini surfaces could successfully contact the objects in the task. Orientation of the mount was orchestrated to ensure that an additional fingernail could be added to the DenseTact Minis to enable the successful pick up of a plastic coin.
RGB and depth images were collected with a D435 Intel Realsense camera attached to the end effector of the xArm. Each task featured objects of unique colors, so applying HSV masks to the Realsense image stream was a fast and robust approach to isolating manipulation targets. Once object centers were identified using the average value of the HSV masks, these points could be projected from camera space to world space. This provided targets for manipulation, and was used to identify the center of the coin, dollar bill, cube, and goal.
The peg-in-hole posed a challenge to the 2D mask approach used for the other tasks. Certain views would capture a partial view of the tall cylinder or the goal, which produced consistent errors in our estimation of their true centers. HSV masking captured only one side of these objects, and due to the accuracy required for peg-in-hole, a more robust solution was needed. 2D object masks were now used to isolate regions of the Realsense depth image, using the RGB to depth mapping provided with the Realsense camera. Once a mask was applied to a depth image, the trimmed 2D depth map was projected to a dense 3D point cloud. The density of the point cloud was unnecessarily high, so only 5% of projected points were saved.
The point cloud computation was repeated for nine top-down views of the scene, to capture all faces of each object and produce a more accurate estimate of their centers. These nine sparse point clouds were simply averaged for each object to find the true center. Upon testing, it was found that these estimates were accurate enough to complete the peg-in-hole task without need for force-surface estimation. Different peg and goal positions were used to verify the robustness of this approach, which although more time-consuming than using a single mask, was highly consistent.
The dollar bill and coin picking tasks were challenging due to the thin objects increasing the need for accurate grasp planning. To accomplish this, our approach incorporated the xArm force-torque sensor to find the surface of the table. We found that this method provided greater accuracy than using the depth image from a high view of the table surface. In our algorithm, the xArm is commanded to a known safe height from the table surface. Then, over a series of up to 50 steps, it moves down by ~1mm and checks the force it is exerting on the table surface. If this force is less than 3N, it takes another step. It then saves the current height, and moves up slightly to prepare for grasping.
For the dollar bill and coin tasks, after table surface sensing is complete the objects must be grasped. We accomplish this by breaking the planned motion (move up and close gripper) into a series of steps which are sequentially executed to slowly lift and grab the object. This is required because as the parallel-jaw gripper closes, it also translates the jaws towards the table. For the dollar bill in particular, it is useful to drag the fingers along the surface of the bill to pull it into a grasp, so it is important to perform these motions at the same time.