At the end of this project, we were able to pick up a peg and place it in a hole. Moreover, this objective can be accomplished with various peg and hole orientations. As far as our envisioned design goes, our manipulator does complete the basic expectations set out at the beginning of the task.
However, several challenges remain in fully optimizing the system. As mentioned before, these include issues with point cloud fidelity due to noise, shadowing, and unwrapping, as well as extrinsic calibration errors. The training process for Diffusion EDF, which typically relies on SLAM-based scene reconstruction, also faces limitations, particularly with sensitivity to input point cloud quality. Additionally, open-loop control methods cannot correct errors in target placement or picking poses, and slow gain updates with stoppages can hinder overall performance.
There are several other factors to consider when designing a policy for such a manipulator in the real world. For example, in a real-world environment, spiral search will yield long task completion times and as this task is extended to a more complex assembly, this simple search heuristic does not generalize. For a more applicable approach, the vision model should be more robust to point cloud artifacts as well as being able to react to changes in grasp posture or within the scene without a lengthy point cloud capture process.
Vision: In the near future, we will recalibrate the cameras and use Dense RGB-D SLAM to reconstruct the scene to determine if it would significantly improve the margins of error, allowing us to remove the intermediary spiral search step. In addition, we aim to shift to a more closed-loop vision control approach such as keypoint tracking or optical-flow inspired deep learning models. These models rely on improved keypoint detection models to detect salient features of objects or end effector and then predict the future displacements from the current position. Formulating closed loop vision based control in this manner allows training on action-less video demonstrations and could ease model training. In addition, you can could generalize to different manipulators by being able to train the model for a more complex task of predicting manipulator trajectories on a varied video dataset. The lower level impedance policy could be trained separately for each robot.
Control: Currently, the gain scheduling policy cannot operate with just force-torque data as it does not contain enough information to converge to intelligent search behavior. This could be remedied with the use of more detailed tactile sensors mounted on the gripper fingers that could yield further data to aid in tactile only exploration for the hole. In addition, the backend for the software does not implement high frequency gain updates which means that full efficacy of active gain scheduling cannot be assessed.