We conduct experiments in different task settings to verify that our method is not restrictive to purely symmetrical scenes and cartesian positional control. In this part, all training results are evaluated on both fully cooperative and local tasks with at most 2 objects.
Randomized robot bases
We try to randomly initialize position and orientation of two robot bases during training, so they are not necessarily centered around the origin of the world frame. The robot base position is sampled from 0.5 meters to 0.7 meters to the gap center. And the angle between x-axis of the world frame and the vector pointing from the world origin to the base position is randomly picked from [-pi/3, pi/3]. We also initialize the robot base orientation randomly from [-pi, pi]. As shown in following figure, training with randomized robot bases (blue) performs comparable to the original setting with fixed bases (red). Some slight performance drop is due to more corner cases where objects are located near the boundary of the robots’ operational spaces.
Results of randomizing positions and orientations of robot bases (blue). The original results with fixed robot bases (red) are included as reference.
Different robots
We then try to replace one Franka Panda robot with a Kuka robot. The two robots are of different kinematics. From blow figure we can see that the training results are largely not effected by different types of robots.
Results of learning rearrangement and handover with different robots. The red curve is trained with two Franka Panda robots, and the blue curve is with one Franka and one Kuka robot.
Enable rotation control for end effectors
We add one degree of controllable action to each end effector so that it can rotate around the z-axis of the robot base frame. The training curves are in following figure. When rotational control is enabled, the agent requires more samples compared to when the orientations of end effectors are kept fixed, but can converge to similar success rate.
Comparison between cartesian positional control with fixed orientation (red) and with one dimension of rotational control around z-axis (blue).