Implementation

Camera Calibration

Diffusion EDF

Data Collection

Contribution and Discussion

Geometric Impedance Controller

Data Collection

Gain Scheduling Network

Discussion

Spiral Search

We started with a baseline for Diffusion EDF policy (which we tuned) and used Neuromeka's inbuilt robot controller. Described are the portions of implementation that we added to this project.

Camera Calibration

Intrinsic parameters were calibrated with the classic checkerboard method
Extrinsic parameters were calibrated by placing an Aruco tag at various points on the workspace, entering their measured positions relative to the robot base, and using camera observations of the tag to compute (and "average") the homogenous transform between the cameras and the robot base

X depicts placements of the detected tag corners. Several coordinates relative to the robot base are also listed. Also shown is a border before which the camera exhibits Barrel distortion.

Cropped and merged point cloud example.

Once the cameras were calibrated, we made different attempts to merge:
- Naïve approach
- ICP
- Colored ICP
We used Iterative Closest Point (ICP) to merge the point clouds of each camera. This worked best by cropping the environment beyond our workspace.

Diffusion EDF

Group Equivariance – a group operation/transformation on the input produces a corresponding transformation on output
Left Equivariance SE(3) – equivariant to rigid transformations to target place or object position
Right Equivariance SE(3) – equivariant to changes in grasp posture
Bi-Equivariant Descriptor Fields – represent shapes/scenes as a sum of high dimensional vector fields predicted by neural networks
- Extension of Neural Implicit Shapes: train neural networks to predict “descriptors” for a depth input and query points
- Simple descriptors examples: “objectness”, modified distance to salient points
- Descriptors here are type 0 scalars, type 1 & 2 vectors and refer to the order of spherical harmonics

Simply put, Diffusion EDF improves the generalizability of our policy to varying orientations of the target pose and grasp posture.

Example of Group Equivariance

Example EDF visualization

Data Collection

Picking Pose Produced by Diffusion EDF

Placing Pose Produced by Diffusion EDF

Diffusion EDF is a trained neural network with equivariant layers. The network was developed by our collaborators but has been tuned ourselves
Demonstration Process:

Capture a point cloud of the scene, including the peg and hole.
Manually move the robot arm to “teach” it how to grasp the peg, recording the pose.
The robot grasps the peg and lifts it to a position where one of the cameras has a clear view of the gripper and peg.
At this position, the robot rotates its wrist by 90˚ four times to capture point clouds from different perspectives.
The four point clouds are reconstructed into a complete point cloud of the grasped peg.
Manually move the robot arm to “teach” it how to place the peg in the hole, recording the placement pose.

In total, we collected 11 full demonstrations to train our model.

diffedf_data_collection.mp4

A short montage of training Diffusion EDF

Contribution and Discussion

Collected appropriate training datasets and modified sampling procedures to work with robot setup
Created a pipeline to capture, stitch and feed individual point clouds to trained models and extract practical grasps
Extracted query field values to optimize for end effector to tool frame transformation (unsuccessful)
Devised heuristics to improve picking and placement poses

Geometric Impedance Controller

For the placing manipulation task, we used impedance control, as it allows the end effector to behave like a mass-spring-damper system
- Impedance controller ensures compliance and adaptability to the interacting environment
Behavioral Cloning was used to learn a gain scheduling policy for insertion
- We demonstrated the manipulation task to the robot, and the robot replicated our behavior through cloning the gains

Data Collection

For manipulation task demonstration, we adjusted the x, y, z, and rotation gains of our robot with a GUI we built as the manipulator brought the peg in its grasp into the target hole.
This is the process we used to collect data:
1. Randomize the initial pose above the hole.
2. Move to the target place position.
3. While moving, adjust the gains to demonstrate the desired behavior and "teach" the robot how to manipulate when the peg is in contact with the platform.
4. When misaligned, lower the z-gain and increase the x, y, and rotation gains to orient the peg more accurately.
5. When seemingly aligned, relax the x, y, and rotation gains, and increase the z-gain to insert the peg into the hole.
Over the course of this process, error vector, velocity, force/torque sensor data, and force/torque sensor bias together with "ground-truth" gain data was collected.
Force/torque sensor data has inevitable bias, which we averaged for 5 seconds and eliminated the bias from the collected data

gic_data_collection.mp4

Gain Scheduling Policy Data Collection Process

Gain Scheduling Network

Built and trained a 6-layer perceptron network with 128 neurons/layer
Input: geometrically-consistent error vectors (6 inputs), end effector velocities (6 inputs), force/torque sensor data written as a wrench vector (6 inputs)
Output: 6 diagonal entries of compliance matrix: x gain, y gain, z gain, and 3 rotational gains
We tried different combinations:
- 32 or 128 neurons
- 2~8 hidden layers
- with or without residual connection
- residual connection every 2 or 4 layers
- with or without noise to the dataset
- with or without force/torque sensor bias
Out of these combinations, the network with 6 layers and 128 neurons, residual connections every 2 layers, no added noise to the dataset, and the elimination of force/torque sensor bias performed the best

Above are the losses of our compliance policy MLP network with a varying number of layers

Discussion

With the learned gain scheduling policy, the robot arm successfully places the peg in the hole when the arm is randomly initialized above the hole
However, we realized a fundamental issue with our pipeline: we expected the placement pose from Diffusion EDF to be sufficiently accurate for the gain scheduling policy to handle
The target pose acquired from Diffusion EDF exceeded the Geometric Impedance Controller's (GIC) precision threshold of 5mm, making this pipeline unfeasible in an open-loop configuration
We came up with two solutions that will temporarily solve the issue in a open loop configuration (we intend to make this pipeline closed loop in the future)
- Training the network without the geometric error vector but resulted in “wandering” along surface
- Implementing Spiral Search using GIC around inferred place pose

Working_impedance_1.mp4

Successful Peg-in-Hole Placement with the learned Gain Scheduling Policy

Spiral Search

Spiral_success_1.mp4

Successful Spiral Search

Initialization and Centering: The manipulator starts at the initial placement position and establishes a spiral search plane. The search area and resolution are defined, considering the expected size and location of the PiH (Pin-in-Hole) feature.
Spiral Path Execution: The manipulator follows a predefined spiral trajectory, systematically moving outward from the center. At each step, it performs contact sensing or feedback evaluation to detect alignment or proximity to the target.
Feedback Integration and Task Completion: Real-time feedback from force/torque sensors is used to refine the spiral path dynamically. Once the PiH alignment is detected, the manipulator transitions to insertion or assembly operations, ensuring precise task execution.

Page updated

Report abuse