Code and Using the Dataset

GitHub Code

Code for this dataset and neural network models is from our paper The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints, with examples and instructions further down on this page:

  1. CoSTAR Dataset Loading Repository — PyTorch and TensorFlow dataset loader
  2. HyperTree Architecture Search Code — aka CoSTAR Hyper, requires TensorFlow
  3. rENAS: regression Efficient Neural Architecture Search Code — requires TensorFlow
  4. CoSTAR Objects - 3D Models of blocks, bin, and other objects

Dataset Summary

All stack attempts are saved with h5py in the HDF5 file format. The files are divided into several categories and named based on what happened on the actual robot:

Stack Attempt Types

  • Success - The robot completed a successful stack attempt, and filenames will contain the string "success".
  • Failure - The robot failed to complete a stack in this attempt, which can happen in two ways:
    • Task Failure - The robot failed at its task of creating a stack, and was otherwise running without issues at the end of the attempt. Files which represent task failures will be named with the string "failure".
    • Error Failure - The robot encountered an error causing the attempt to end early. For example, when the robot collides with an object with too much force it will automatically trigger a security stop, other sources of errors are planning errors and crashes due to bugs. Files which represent error failures will be named with the string "error.failure".
  • Other H5F File Categories - Some files became unreadable and others contain no images. These are included to accurately represent the collection process and might include logs indicating what went wrong. However, these files are not counted in the summary and are not present in the various train/val/test lists.
costar block stacking dataset website table

Viewing Data

You can use scripts in costar_dataset to view the dataset. Detailed information and code can be found in costar_plan/ctp_integration/README.md.

    • Preview an h5f file as video.
    • Convert video from an h5f file into an mp4 file or a gif.
    • Relabel data in a dataset.
    • Save labeled key frames individually as a named jpeg or as a tiled sequence.

    • View the dataset using bokeh and holoviews.
    • Scroll through individual time steps and image frames.
    • Plot state, such as action labels, over time.

The image above is a screenshot of the user interface when using stack_player.py.

There is a slider at the bottom that can be dragged to a specific frame, followed by three buttons: Play, Prev, and Next.

Press Play to play the frames for this data point; press again to pause. Press Prev and Next to switch between consecutive attempts.

At the top, the image for the selected frame is shown. Three more rows of plot show different data channels of the robot at that time step.

Gripper shows the open/close state of the gripper on the robot.

Action shows the specific action index that the robot is executing at the selected frame for use in visual prediction where the object is more visible. See Visual Robot Task Planning. Corresponds to label feature in the feature section. The context corresponding to the index can be found in labels_to_name feature.

Gripper Action is a time-shifted Action graph that shows when the gripper actually triggers.

Folder Structure

The dataset contains two major subsets: one with blocks only, and one with blocks and plush toy distractions. The subsets are organized in different directories under the main dataset folder. The filename lists for training models on different subset is also included. The folder structure is as follows.

costar_block_stacking_dataset_v0.4/
- costar_block_stacking_dataset_v0.4_combined_*_files.txt
- costar_block_stacking_dataset_v0.4_combined_summary.csv
- costar_upload_files_hash.csv
- rgb_PS1080_PrimeSense.yaml
- README.md
- blocks_only/
  - costar_block_stacking_dataset_v0.4_blocks_only_success_only_train_files.txt
  - costar_block_stacking_v0.4_blocks_only_success_only_test_files.txt
  - costar_block_stacking_v0.4_blocks_only_success_only_val_files.txt
  - costar_block_stacking_v0.4_blocks_only_*_files.txt
  - rename_dataset_labels.csv
  - *.h5f

- blocks_with_plush_toy/
  - costar_block_stacking_dataset_v0.4_blocks_with_plush_toy_success_only_train_files.txt
  - costar_plush_block_stacking_v0.4_blocks_with_plush_toy_success_only_test_files.txt
  - costar_plush_block_stacking_v0.4_blocks_with_plush_toy_success_only_val_files.txt
  - costar_plush_block_stacking_v0.4_blocks_with_plush_toy_success_only_corrupted_files.txt
  - costar_block_stacking_v0.4_blocks_with_plush_toy_*_files.txt
  - rename_dataset_labels.csv
  - *.h5f

Filenames

Stack attempt data is in the hdf5 format and takes the following form:

YYYY-MM-DD-HH-MM-SS_example######.[success,failure,error.failure].h5f

Here are several specific filenames:

2018-05-29-15-00-28_example000001.success.h5f
2018-05-30-15-33-16_example000031.success.h5f
2018-06-04-10-01-33_example000001.error.failure.h5f
2018-06-04-10-01-33_example000001.error.failure.h5f

The yaml file contains the camera calibration:

rgb_PS1080_PrimeSense.yaml

Human labeling data, image presence, and notes on interesting examples:

rename_dataset_labels.csv

Data Channels and Time Steps

Every example can be loaded easily with h5py and all features has an equal number of frames collected at 10Hz, or 0.1s per frame. The number of frames will vary with each example, including examples with zero frames which are typically in the *.error.failure.h5f examples.

Here is a complete list of features:

  • nsecs - Nanosecond component of the timestamp for this entry.
  • secs - Second component of the timestamp for this entry.
  • q - The 6 joint angles of the UR5 arm from base to tip in radians.
  • dq - Change in joint angle q from the previous time step.
  • pose - Pose of the gripper end effector.
  • camera - Pose of the camera.
  • image - RGB image encoded as binary data in the JPEG format. It has already been rectified and calibrated
  • depth_image - Encoded depth images in PNG format with measurements in millimeters. It has already been rectified and calibrated.
  • goal_idx - The time step index in the data list at which the goal is reached. For grasps and placements, this changes after the gripper respectively opens or closes and then backs off the target.
  • gripper - A float indicating the open/close state of the gripper. 0 (~0.055 in practice) is completely open and 1 (in practice TBD) is completely closed.
  • label - Action label at the current time step, as defined by labels_to_name.
  • info - String description of the current step.
  • depth_info - Currently empty.
  • rgb_info - Currently empty.
  • object - Identifier of the object the robot will be interacting with.
  • object_pose - Pose of entry in object feature detected via ObjRecRANSAC.
  • labels_to_name - List of action description strings. The string index corresponds to the integer label for that action in label. i.e., if data["labels_to_name"][0] is "grab_blue", then its corresponding integer index is 0.
  • rgb_info_D - Camera calibration param, may be empty. See the yaml file in the dataset for the values.
  • rgb_info_K - Camera calibration param, may be empty. See the yaml file in the dataset for the values.
  • rgb_info_R - Camera calibration param, may be empty. See the yaml file in the dataset for the values.
  • rgb_info_P - Camera calibration param, may be empty. See the yaml file in the dataset for the values.
  • rgb_info_distortion_model - Camera calibration param, may be empty. See the yaml file in the dataset for the values.
  • depth_info_D - Camera calibration param, may be empty. See the yaml file in the dataset for the values.
  • depth_info_K - Camera calibration param, may be empty. See the yaml file in the dataset for the values.
  • depth_info_R - Camera calibration param, may be empty. See the yaml file in the dataset for the values.
  • depth_info_P - Camera calibration param, may be empty. See the yaml file in the dataset for the values.
  • depth_distortion_model - Camera calibration param, may be empty. See the yaml file in the dataset for the values.
  • all_tf2_frames_as_yaml - List of YAML string that, when loaded, define a dictionary mapping from coordinate frame names to frame poses in YAML format. All transforms are specified relative to the robot base.
  • all_tf2_frames_from_base_link_vec_quat_xyzxyzw_json - List of JSON strings that, when loaded, define a dictionary mapping from coordinate frame names to a list of doubles in [x, y, z, qx, qy, qz, qw] translation and quaternion order. All transforms are specified relative to the robot base.
  • visualization_marker - Transform from the robot base to the AR tag marker.
  • camera_rgb_optical_frame_pose - The pose of the camera RGB image optical frame relative to the robot base.
  • camera_depth_optical_frame_pose - The pose of the camera depth image optical frame relative to the robot base.

Action Encoding

Actions are one-hot encoded, so there will be a list of values all set to 0 with length 41, the total number of actions. However, for the active action, the value at action_list[value_index] is 1. Here is the list of all action_index values and what each means:

00 'place_green_on_yellow'
01 'move_to_home'
02 'place_blue_on_yellowred'
03 'place_yellow_on_red'
04 'place_blue_on_red'
05 'grab_blue', 
06 'place_red_on_blueyellow', 
07 'place_green_on_redyellow', 
08 'place_red_on_yellow', 
09 'place_green_on_blueyellow', 
10 'place_red_on_greenblue', 
11 'place_blue_on_green', 
12 'place_blue_on_redgreen', 
13 'place_yellow_on_greenblue', 
14 'place_yellow_on_blue', 
15 'place_blue_on_greenyellow', 
16 'place_blue_on_yellowgreen', 
17 'place_blue_on_greenred', 
18 'place_yellow_on_redgreen', 
19 'grab_yellow', 
20 'place_red_on_greenyellow', 
21 'grab_green', 
22 'place_red_on_green', 
23 'place_yellow_on_bluered', 
24 'place_yellow_on_green', 
25 'place_green_on_blue', 
26 'place_yellow_on_bluegreen', 
27 'place_blue_on_redyellow', 
28 'place_red_on_blue', 
29 'place_red_on_yellowgreen', 
30 'place_yellow_on_greenred', 
31 'place_green_on_yellowblue', 
32 'place_red_on_bluegreen', 
33 'place_green_on_red', 
34 'place_red_on_yellowblue', 
35 'place_green_on_yellowred', 
36 'place_green_on_redblue', 
37 'grab_red', 
38 'place_yellow_on_redblue', 
39 'place_green_on_bluered', 
40 'place_blue_on_yellow'

Using Different Splits and Versions

We highly recommend using the splits we have provided so comparisons can be made more accurately across papers. We expect to expand the dataset progressively between versions. You can visualize the proportions of each subset in the Dataset Summary section. Our chosen splits are defined in files named as follows:

costar_block_stacking_dataset_{Version}_{ObjectSubset}_{AttemptType}_{TVTSubset}_files.txt

The names in {curly brackets} above vary depending on the relevant subset. Each subset is described below.

Version:

0.4 at the time of writing. Version numbers may be updated due to changes in the dataset including both new data and corrections.

ObjectSubset:

  • blocks_only - red, green, yellow, and blue 5.1cm wooden cubes are present
  • blocks_with_plush_toy - blocks are present plus 24 colorful plush toy distractors
  • combined - contains both the data from the blocks_only and blocks_with_plush_toy subsets

AttemptType:

  • success_only
  • error_failure_only
  • task_failure_only
  • task_and_error_failure

TVTSubset:

  • train - for training a neural network
  • val - for verifying the performance, though this subset may be used for optimization
  • test - for verification of final results on the final model

Short & Simple Code Snippet

Here we load one attempt and show the images. We've tried to ensure everything is easy to work with:

Here is the output you should see after running this code:

For additional script information, please refer to the README file.

Data Loading Code for Training

Data can be loaded with:

block_stacking_reader.py includes a standard python generator and the parent repositories include code which uses this loader with Keras and TensorFlow. While two Keras files are imported, these data loading files only depend on basic libraries and so won't make use of a gpu or interfere with other popular deep learning libraries such as pytorch.

Notes, Limitations and Frequently asked Questions

Below we will write notes about the dataset & the answers to questions we have received.

Split Selection

All subsets are randomly ordered with the python or numpy shuffle function and a random seed of 0 before val and test subsets are selected from the front, and files are only included in a split if they are readable and contain at least one image. The success_only + blocks_only dataset was selected well before the others with 128 attempts for each of the validation and test subsets. The success_only + blocks_with_plush_toy dataset was chosen to have 64 attempts for each of the validation and test sets.

A few examples were mislabeled by the automatic labeling system in v0.2, so we hand labeled all of the data for v0.3. Fortunately the failure set had never been utilized and contained quite a few successes. Mislabeled successes which were actually failures were very rare. Once all attempt type names were corrected we moved files to the corresponding category. We also collected some additional plush toy data between v0.3 and v0.4.

We calculated the proportion between train and val for success_only data in the blocks_only and blocks_with_plush_toy subsets; each proportion was used to determine the number of train and val files in error_failure_only and task_failure_only. Most importantly, we have ensured no files have ever mixed between train, val, and test.

Code to split file lists

Error Logs

The files marked with error.failure indicates a problem we cannot currently recover from, such as a security stop or a ROS planning error. About halfway through the dataset we saved the final error string on error.failure cases, hopefully that will assist in diagnosing/classifiying more detailed reasons for why failures occur.

2018-05-15 Broken AR Tag Mount

The AR tag mount broke around 2018-05-15. Some of the dataset was collected with the tag shifting around. Starting with:

2018-05-17-16-39-30_example000001.success.h5f

We re-glued the AR tag so the hand-eye calibration is different. The hand-eye calibration is not regenerated since objects are still being correctly grasped at this point. However, at least some successes were being reported as failures after this point.

2018-05-17-16-39 RGB/Depth Time Synchronization

The RGB and depth data (in fact all data) is not always perfectly time synchronized, and in some cases many depth frames are missing. This is much more common in examples containing failures and errors *.failure.h5f and *.error.failure.h5f examples than it is in successes *.success.h5f.

Some bugs were fixed during the collection process which improved the synchronization dramatically, so use the filenames to choose more recently collected data, try after 2018-05-17-16-39, if you need to minimize the synchronization errors.

AR Tag mount broken - second occasion

The AR Tag mount on the gripper broke a second time, and we simply left it detached for collection until the first data in September. Unfortunately, the exact date has been lost. In general, we advise not making any assumptions about the precision of the AR tag position relative to the robot. It can vary by a few centimeters and vary noticeably in angle. It should only be used if very rough values are needed.

The hand eye calibration itself seems to remain OK.

2018-08-31 Gripper Failure

The gripper failed on 2018-08-31 at the time 22:27:44, the example:

~/.keras/datasets/costar_block_stacking_dataset_v0.4/blocks_with_plush_toy/2018-08-31-22-27-44_example000003.error.failure.h5f

and examples between 2018-08-31-22-27-44 and 2018-09-01 have the gripper locked in the closed position, and data may or may not be recorded for the gripper state.

2018-09-07 Gripper Repaired; Appearance Change

We repaired the connector and added a piece to minimize flexing of the wire, and changed where the gripper wire is attached. This means there will be an appearance change which may affect predictions.