Hardware and Software Pipeline
The manipulation system should be capable of recognizing the peg, target hole and inserting the peg. Our pipeline accomplishes this by breaking the task into pick, align and insertion sub-components:
Pick
A point cloud of the scene is taken from the two cameras, stitched together, and transformed into the robot base frame. This reconstructed point cloud and a known empty gripper point cloud is fed into the vision model to produce a grasp pose.
Extra waypoints around the place pose are calculated to afford the manipulator extra clearance
Target waypoints are fed into the default task-space controller to execute pick smoothly
Place
After grasp, point clouds of end effector with the peg are collected, stitched and cropped to be fed into the model again along with the previous point cloud to retrieve a "place" pose
Extra waypoints above the place pose are added to avoid part collisions and the highest waypoint is fed into the default task-space controller
The custom geometric impedance controller is activated with low gains and the robot effector is lowered to the surface
A spiral search is conducted to "scan" for the hole and insert the peg into the hole.
Hardware Design Choices
Camera Placement: We placed the cameras diagonally across from each other in a way that the arm itself wouldn't obstruct their view and their merged point clouds span the full occupied workspace. The resulting fidelity of the point clouds is moderate but it has artifacts that reduce the precision of the Diffusion EDF output poses. An alternative is to use the RGB-D slam with camera on the end-effector to reconstruct the scene with better detail. However, this process is cumbersome as it needs to be repeated when anything in the scene changes and in a task with frequent contact interactions, target objects are prone to motion.
Scene Arrangement: We used a uniform, non-reflective green mat on our workspace to prevent reflected light from distorting our ToF sensors. In the real world, we may not have conditions this ideal. However, our project is meant to establish a baseline to further work in similar precision tasks.
Software Design Choices
Diffusion Equivariant Descriptor Fields — We selected this model because its equivariance properties made it data efficient and easily deployable/generalizable from few collected training examples. However, this is an open loop planning method so the quality of point cloud inputs directly influences the precision of the final output. The artifacting from our pointclouds resulted in the target poses having an error of 3-5 mm which propagated error to our impedance controller.
Gain Scheduling Policy — A fully-connected neural network to adjust the gains based on the current error and force-torque sensor data. Gain scheduling would theoretically allow the robot arm to increase compliance near the hole or during insertion while moving more aggressively when farther away. However, this network heavily relies on the error and force torque sensor data to be accurate, because the controller is not robust to errors in the reference or input data.
Spiral search algorithm — Our initial attempt with a gain scheduling policy was unsuccessful because the incorrect target from Diffusion EDF would produce large errors in the system output. Spiral search would perform an outward spiral starting at the Diffusion EDF output and based on a force-torque threshold, the algorithm would direct the arm would insert into the hole.