Berkeley UR5 Demonstration Dataset
Tasks:
tiger pick and place: The task string is "Take the tiger out of the red bowl and put it in the grey bowl." The stuffed animal (tiger) always starts in the red bowl. The positions of the two bowls are randomized on the table, while the gripper is initialized to a fixed pose. Technically, the pick-and-place task only requires translation actions of the gripper.
cloth sweeping: The task string is "Sweep the green cloth to the left side of the table." The cloth is randomly initialized at a place on the right side of the table, and the gripper needs to push it to the left side horizontally. The gripper's starting pose is randomly initialized by adding noises from a fixed position. Technically, the sweeping task only requires translation actions of the gripper.
cup stacking: The task string is "Pick up the blue cup and put it into the brown cup." The positions of the two cups are randomized on the table, and the gripper's starting pose is random. Technically, the stacking task only requires translation actions of the gripper.
bottle pick and place: The task string is "Put the ranch bottle into the pot." The position of the pot is fixed, while the position of the ranch bottle is randomized. The gripper's starting pose is fixed. This task involves both translation and rotation actions.
Data format:
The data is stored as a Numpy array of trajectories. Each array element is a separate trajectory of length L=250 formatted as a dictionary with the following key and value pairs:
robot_state: np.ndarray((L, 15))
This stores the robot state at each timestep.
[joint0, joint1, joint2, joint3, joint4, joint5, x,y,z, qx,qy,qz,qw, gripper_is_closed, action_blocked]
x,y,z, qx,qy,qz,qw is the end-effector pose expressed in the robot base frame
gripper_is_closed is binary: 0 = fully open; 1 = fully closed
action_blocked is binary: 1 if the gripper opening/closing action is being executed and no other actions can be performed; 0 otherwise.
action: np.ndarray((L, 8))
This stores the expert action input at each timestep.
[x, y, z, roll, pitch, yaw, delta_gripper_closed, terminate]. Each variable represents the delta change to the dimension value. The delta is with respect to the robot base frame. The range of x, y, z is [-0.02, 0.02], and the range of roll, pitch, yaw is [-1/15, 1/15].
delta_gripper_closed is ternary: 1 if gripper closing needs to be triggered from an open state, -1 if gripper opening needs to be triggered from a closed state, 0 if no change. This representation space (0 everywhere except 1 or -1 at 1 or 2 timesteps) could be hard to learn. Preprocessing it to be the action state (0 at the beginning and becomes 1 after the gripper closing action is triggered and keeps at 1 until it becomes 0 when the gripper opening action is triggered) could make the policy easier to learn. Note that the result of such preprocessing will not be equal to the gripper_is_closed key in robot_state as the state has a delay (it takes a few timesteps after the action command is sent before the gripper state actually changes).
image: np.ndarray((L, 480, 640, 3)) the image captured from the robot workspace.
task: np.ndarray((L, 1)) storing the name of the task in natural language.
other:
"hand_image": np.ndarray((L, 480, 640, 3)).
"third_person_image": np.ndarray((L, 480, 640, 4)). The first 3 channels are the same as "image," and the last dimension is depth.