The emerging utility of grasping robots in science and industry motivates the need for effective and seamless human-to-robot interaction. Recent research in the fields of deep learning has facilitated the possibility for safe, adaptive grasping of an object handed by a human operator.
For this project, we aimed to recreate this kind of adaptive grasp with a single RGB camera mounted on the xArm's gripper tool. Using Deep Reinforcement Learning, we train the arm to identify an object placed in front of it. It repeatedly chooses a small joint movement that will result in a position closer to a grasp, eventually getting close enough to grasp the object from a user's hand.
State space:
64x64 RGB Image
Action space:
4 discrete axes {-1, 0, 1}, representing action deltas
Base Rotation (Yaw)
Vertical movement
Forward movement
Gripper Rotation
So, 3^4=81 possible actions
Reward:
Dense: negative distance to object
Sparse: within error margin of graspable position
We trained the xArm in simulation first, significantly increasing the speed of training because we could teleport the gripper around to learn a policy.
The algorithm used was DQN – we learned Q values for our factored action space as shown to the right. The Q-value for an action was the sum of the max Q for each part of the action. As described in the MDP, each part of the action had {-1, 0, 1} components representing a move backwards, constant, or forwards, respectively.
Therefore, the sample action's Q-value on the right would be 0.3+0.4+0.1+0.1 = 0.9
Hyperparameters: γ (discount factor)=0.98, ε (exploration rate)=1.0 - 0.02, α (learning rate)=1e-3, batch size=64, buffer size=20000, target update frequency=1000 steps
Input Layer: Batch of 64x64 RGB images
2D Convolutional layer x4, 16 channels
ReLU Nonlinearity
Max-Pooling
Avg-Pooling
MLP
3 Dense Linear Layers
ReLU Non-Linearity
Output Layer: 12-vector of q-values: 4 sets of 3
Q(s, a) = sum(max(q_axis) for axis in action_space)
We added the following training augmentations to assist in the Sim2Real transition.
Background Replace
Gripper rotation correction
Random Arm start + Object start
Random wobbling movement on Object
Q-network output
We had trouble getting the arm in simulation to calculate inverse kinematics correctly for the position our q-network was predicting. Since we were also using teleportation, we had to deal with and resolve collisions with the robot itself and the ground. It was also difficult to get the red object moving in exactly the way we wanted during training, so this required much adjustment throughout the project.
After resolving these issues and switching from dense to sparse reward though, we were able to get about 90% success for grasping in simulation. We saved the weights when the average reward was highest during training and show the learned policy below.
Learning curves
This is a tricky learning problem because of the high dimensionality of the search space, and the noisy information conveyed by the observations. That being said, the problem can be learned fairly easily in simulation with the right setup, as shown in the video above. We ran into some problems doing Sim2Real, but hope that more augmentations with possibly longer training times will enable grasping in the real world.