For each task, we establish multiple distinct stages, each performing a specific and distinct portion of the task. Additionally, all tasks follow roughly the same cascade of stages, including:
Stage 0: Analyze the current camera frame for all relevant objects, including object to pick up and landing area (per the perception pipeline). Take note of all object positions. If all relevant objects are not detected within a time period, raise the end effector (expanding the camera’s view over the table) and repeat.
Stage 1: Move down to the noted position (from Stage 0) of the object to pick up. Wait for a little for stability.
Stage 2: Close the gripper position to pick up the object. Wait for a little for stability.
Stage 3: Move the gripper and its gripped object over the landing zone (or hole for the peg-in-hole task)
Stage 4: Release the grip, allowing the object to enter the valid landing zone.
All end-effector poses are published to the ‘me314_xarm_current_pose’ topic, whereas the gripper position (open/close) is published to the ‘me314_xarm_gripper_position’ topic.
if self.stage == 0:
# Stage 0: Perception – locate pick and place points; raise camera if unseen
if not (self.pick_pt and self.place_pt):
if self.current_pose and self.current_pose.position.z < 0.5:
target = Point(
x=self.current_pose.position.x,
y=self.current_pose.position.y,
z=self.current_pose.position.z + lift_delta
)
self._publish_pose(target)
return
self.stage = 1
elif self.stage == 1:
# Stage 1: Approach – move end-effector down over the pick point
p = Point(x=self.pick_pt.x, y=self.pick_pt.y, z=approach_z)
self._publish_pose(p)
self.stage = 2
elif self.stage == 2:
# Stage 2: Grasp – close the gripper to pick up the object
self._publish_gripper(grasp_pos)
self.stage = 3
elif self.stage == 3:
# Stage 3: Transport – carry object to the place point at transport height
p = Point(x=self.place_pt.x, y=self.place_pt.y, z=transport_z)
self._publish_pose(p)
self.stage = 4
elif self.stage == 4:
# Stage 4: Release – open the gripper to drop the object
self._publish_gripper(release_pos)
self.done = True
The main perception algorithm is OpenCV-based color segmentation in the HSV space to identify objects such as pegs, blocks, or holes based on their color characteristics. For circular features (e.g., holes or coins), it applies the Hough Circle Transform to localize the object center in pixel space.
Image to Pixel Node
The ImageToPixel node is responsible for processing incoming RGB images from the camera and extracting the pixel coordinates of relevant objects. It subscribes to the camera's image topic and uses OpenCV-based color segmentation in the HSV space to identify objects such as pegs, blocks, or holes based on their color characteristics. For circular features (e.g., holes or coins), it applies the Hough Circle Transform to localize the object center in pixel space. The node then publishes the detected pixel coordinates.
Pixel to Coord Node
The PixelToCoord node receives pixel coordinates from the ImageToPixel node and transforms these 2D image points into 3D coordinates in the robot's base frame. It subscribes to both RGB and depth image topics, as well as camera intrinsic parameters. Using the depth value at the specified pixel and the camera intrinsics, the node computes the corresponding 3D point in the camera frame. It then applies the appropriate TF2 transformation to convert this point into the global coordinate system of the robot. The resulting 3D coordinates are published for use by the pick-and-place controller.