Grasping – Vision 

The grasping subsystem is split into two different modules: the vision and the arm. The vision module is responsible for recognizing and locating the bottle in the scene, and the arm module is responsible for moving the arm to the bottle, grasping it, and putting it in the basket. Furthermore, the vision module also provides the scissor lift with the height it needs to rise to so the arm can reach bottles on higher platforms.

Grasping software architecture.                                               Grasping hardware architecture.

    The vision subsystem begins when navigation has reached its goal in the kitchen. Once there, the Kinect begins gathering 3D point cloud data of the scene and, using Point Cloud Library (PCL), it filters and segments the point cloud until a cylindrical shape is found (see figure below). Afterward, the bottle’s center coordinate is calculated, translated into the arm’s reference frame, and published to a “center_coord” ROS topic.

Flow of vision process, from Kinect to point clouds, to bottle's point cloud cluster, to bottle's center coordinate.

Since the arm and scissor lift will not function well with continuously updating bottle coordinates, another node was added to filter the streaming data. This filter node averages 20 bottle coordinates before publishing the best estimate to the arm and scissor lift. Afterwards, the filter node suspends all vision processes and only restarts when the arm signals that it needs to retry.

Modeling, Analysis, and Testing

In anticipation of instructors’ attempts at baffling our vision system, we tested it against non-cylindrical objects. Surprisingly, it would occasionally categorize boxes (the yellow and red ones in the figure above) as cylinders. It turned out the RANSAC algorithm would occasionally fit cylinders to small subsections of the box point cloud. For example, while the box’s narrow side is facing the camera (see figure below), two sides and the corner can possibly be fit as a cylinder.

(Left) The narrow or "short" side of the box.  (Right) The "long" side of the box.

To alleviate this problem, we introduced a point threshold check. This check compares the number of points RANSAC returns against the whole cluster. (A “cluster” in this case is the segmented point cloud of the object without any background points.) We ran this check against box(short), box(long), and a bottle. As seen in Table 15, if the point threshold is lower, we are more certain that the item is not a cylinder. However, this is the opposite for a bottle (Table 16). We are more certain that the item is a cylinder when the threshold is high. From this data, we chose to set the threshold at 50% because it provides a balanced detection of boxes at 92.74% and cylinders at 96.12%.

Ability of our vision algorithm to correctly identify boxes as a function of point threshold.
ObjectPoint ThresholdTime
% Not Cylinder% Cylinder
box (long)70%6094.49%5.51%
box (long)60%6099.14%0.86%
box (long)50%60100%0%
box (long)40%60100%0%
box (long)30%60100%0%
box (short)70%6029.06%70.94%
box (short)60%6057.14%42.86%
box (short)50%6092.74%7.26%
box (short)40%6099.19%0.81%
box (short)30%60100.00%0%

Ability of our vision algorithm to correctly identify cylinders as a function of point threshold.
ObjectPoint ThresholdTime
% Not Cylinder% Cylinder

    Furthermore, we conducted tests to verify the bottle coordinate’s accuracy with respect to 3D space. Numerous pages with 1 cm x 1 cm grids were laid out from the Kinect to the bottle (see Figure 24) and a measuring tape was used to verify height. More than 30 samples were taken and we found the error to be less than 2 cm in either of the x, y, and z axes (see Table 17). As a side effect of this testing, we also found that the minimum distance to an object for our particular Kinect is 48 cm.

Paper with 1cm x 1cm grid cells and a measuring tape to measure Kinect and PCL's bottle coordinate accuracy.

Error in bottle coordinate as reported by Kinect and PCL.
X (side-to-side)1 cm
Y (height)1.5 cm
Z (depth)2 cm

Grasping – Arm

The arm starts in an upright, forward-facing position. This is to help keep the scissor lift from listing to one side during navigation. Once the grasping subsystem has been started, the arm module is responsible for taking the bottle coordinate from the filter node and attempting to grasp the bottle. To do so, the module will first apply inverse kinematics to solve for the bottom three joints—shoulder pan, shoulder pitch, and elbow joint (see figure below)—, send joint angle commands to each servo, then command the gripper joint to close. In our configuration, the wrist joint is never changed because the gripper is always parallel to the tabletop.

Once the gripper closes the fingers to a predetermined gap, we poll the current load on the gripper servo to detect whether the bottle has been grasped. If an object is grasped, the current load will report a value less than 0. If no object is grasped, the arm will go through a retry sequence. It will first move to the side so the Kinect has a clear line of sight, then signal for the vision process to start again. Once it has a new bottle coordinate, it will attempt to grasp again. After three failed attempts, the arm moves back to its starting, upright position and signals the navigation subsystem to traverse back to home position.

Each of the servo controlled joints in the Crustcrawler arm.