We started with a baseline for Diffusion EDF policy (which we tuned) and used Neuromeka's inbuilt robot controller. Described are the portions of implementation that we added to this project.
Intrinsic parameters were calibrated with the classic checkerboard method
Extrinsic parameters were calibrated by placing an Aruco tag at various points on the workspace, entering their measured positions relative to the robot base, and using camera observations of the tag to compute (and "average") the homogenous transform between the cameras and the robot base
X depicts placements of the detected tag corners. Several coordinates relative to the robot base are also listed. Also shown is a border before which the camera exhibits Barrel distortion.
Cropped and merged point cloud example.
Once the cameras were calibrated, we made different attempts to merge:
Naïve approach
ICP
Colored ICP
We used Iterative Closest Point (ICP) to merge the point clouds of each camera. This worked best by cropping the environment beyond our workspace.
Group Equivariance – a group operation/transformation on the input produces a corresponding transformation on output
Left Equivariance SE(3) – equivariant to rigid transformations to target place or object position
Right Equivariance SE(3) – equivariant to changes in grasp posture
Bi-Equivariant Descriptor Fields – represent shapes/scenes as a sum of high dimensional vector fields predicted by neural networks
Extension of Neural Implicit Shapes: train neural networks to predict “descriptors” for a depth input and query points
Simple descriptors examples: “objectness”, modified distance to salient points
Descriptors here are type 0 scalars, type 1 & 2 vectors and refer to the order of spherical harmonics
Simply put, Diffusion EDF improves the generalizability of our policy to varying orientations of the target pose and grasp posture.
Example of Group Equivariance
Example EDF visualization
Picking Pose Produced by Diffusion EDF
Placing Pose Produced by Diffusion EDF
Diffusion EDF is a trained neural network with equivariant layers. The network was developed by our collaborators but has been tuned ourselves
Demonstration Process:
Capture a point cloud of the scene, including the peg and hole.
Manually move the robot arm to “teach” it how to grasp the peg, recording the pose.
The robot grasps the peg and lifts it to a position where one of the cameras has a clear view of the gripper and peg.
At this position, the robot rotates its wrist by 90˚ four times to capture point clouds from different perspectives.
The four point clouds are reconstructed into a complete point cloud of the grasped peg.
Manually move the robot arm to “teach” it how to place the peg in the hole, recording the placement pose.
In total, we collected 11 full demonstrations to train our model.
A short montage of training Diffusion EDF
Collected appropriate training datasets and modified sampling procedures to work with robot setup
Created a pipeline to capture, stitch and feed individual point clouds to trained models and extract practical grasps
Extracted query field values to optimize for end effector to tool frame transformation (unsuccessful)
Devised heuristics to improve picking and placement poses
For the placing manipulation task, we used impedance control, as it allows the end effector to behave like a mass-spring-damper system
Impedance controller ensures compliance and adaptability to the interacting environment
Behavioral Cloning was used to learn a gain scheduling policy for insertion
We demonstrated the manipulation task to the robot, and the robot replicated our behavior through cloning the gains
For manipulation task demonstration, we adjusted the x, y, z, and rotation gains of our robot with a GUI we built as the manipulator brought the peg in its grasp into the target hole.
This is the process we used to collect data:
Randomize the initial pose above the hole.
Move to the target place position.
While moving, adjust the gains to demonstrate the desired behavior and "teach" the robot how to manipulate when the peg is in contact with the platform.
When misaligned, lower the z-gain and increase the x, y, and rotation gains to orient the peg more accurately.
When seemingly aligned, relax the x, y, and rotation gains, and increase the z-gain to insert the peg into the hole.
Over the course of this process, error vector, velocity, force/torque sensor data, and force/torque sensor bias together with "ground-truth" gain data was collected.
Force/torque sensor data has inevitable bias, which we averaged for 5 seconds and eliminated the bias from the collected data
Gain Scheduling Policy Data Collection Process
Built and trained a 6-layer perceptron network with 128 neurons/layer
Input: geometrically-consistent error vectors (6 inputs), end effector velocities (6 inputs), force/torque sensor data written as a wrench vector (6 inputs)
Output: 6 diagonal entries of compliance matrix: x gain, y gain, z gain, and 3 rotational gains
We tried different combinations:
32 or 128 neurons
2~8 hidden layers
with or without residual connection
residual connection every 2 or 4 layers
with or without noise to the dataset
with or without force/torque sensor bias
Out of these combinations, the network with 6 layers and 128 neurons, residual connections every 2 layers, no added noise to the dataset, and the elimination of force/torque sensor bias performed the best
Above are the losses of our compliance policy MLP network with a varying number of layers
With the learned gain scheduling policy, the robot arm successfully places the peg in the hole when the arm is randomly initialized above the hole
However, we realized a fundamental issue with our pipeline: we expected the placement pose from Diffusion EDF to be sufficiently accurate for the gain scheduling policy to handle
The target pose acquired from Diffusion EDF exceeded the Geometric Impedance Controller's (GIC) precision threshold of 5mm, making this pipeline unfeasible in an open-loop configuration
We came up with two solutions that will temporarily solve the issue in a open loop configuration (we intend to make this pipeline closed loop in the future)
Training the network without the geometric error vector but resulted in “wandering” along surface
Implementing Spiral Search using GIC around inferred place pose
Successful Peg-in-Hole Placement with the learned Gain Scheduling Policy
Successful Spiral Search
Initialization and Centering: The manipulator starts at the initial placement position and establishes a spiral search plane. The search area and resolution are defined, considering the expected size and location of the PiH (Pin-in-Hole) feature.
Spiral Path Execution: The manipulator follows a predefined spiral trajectory, systematically moving outward from the center. At each step, it performs contact sensing or feedback evaluation to detect alignment or proximity to the target.
Feedback Integration and Task Completion: Real-time feedback from force/torque sensors is used to refine the spiral path dynamically. Once the PiH alignment is detected, the manipulator transitions to insertion or assembly operations, ensuring precise task execution.