Code for this dataset and neural network models is from our paper The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints, with examples and instructions further down on this page:
All stack attempts are saved with h5py in the HDF5 file format. The files are divided into several categories and named based on what happened on the actual robot:
You can use scripts in costar_dataset to view the dataset. Detailed information and code can be found in costar_plan/ctp_integration/README.md.
The image above is a screenshot of the user interface when using stack_player.py.
There is a slider at the bottom that can be dragged to a specific frame, followed by three buttons: Play, Prev, and Next.
Press Play to play the frames for this data point; press again to pause. Press Prev and Next to switch between consecutive attempts.
At the top, the image for the selected frame is shown. Three more rows of plot show different data channels of the robot at that time step.
Gripper shows the open/close state of the gripper on the robot.
Action shows the specific action index that the robot is executing at the selected frame for use in visual prediction where the object is more visible. See Visual Robot Task Planning. Corresponds to label feature in the feature section. The context corresponding to the index can be found in labels_to_name feature.
Gripper Action is a time-shifted Action graph that shows when the gripper actually triggers.
The dataset contains two major subsets: one with blocks only, and one with blocks and plush toy distractions. The subsets are organized in different directories under the main dataset folder. The filename lists for training models on different subset is also included. The folder structure is as follows.
costar_block_stacking_dataset_v0.4/
- costar_block_stacking_dataset_v0.4_combined_*_files.txt
- costar_block_stacking_dataset_v0.4_combined_summary.csv
- costar_upload_files_hash.csv
- rgb_PS1080_PrimeSense.yaml
- README.md
- blocks_only/
- costar_block_stacking_dataset_v0.4_blocks_only_success_only_train_files.txt
- costar_block_stacking_v0.4_blocks_only_success_only_test_files.txt
- costar_block_stacking_v0.4_blocks_only_success_only_val_files.txt
- costar_block_stacking_v0.4_blocks_only_*_files.txt
- rename_dataset_labels.csv
- *.h5f
- blocks_with_plush_toy/
- costar_block_stacking_dataset_v0.4_blocks_with_plush_toy_success_only_train_files.txt
- costar_plush_block_stacking_v0.4_blocks_with_plush_toy_success_only_test_files.txt
- costar_plush_block_stacking_v0.4_blocks_with_plush_toy_success_only_val_files.txt
- costar_plush_block_stacking_v0.4_blocks_with_plush_toy_success_only_corrupted_files.txt
- costar_block_stacking_v0.4_blocks_with_plush_toy_*_files.txt
- rename_dataset_labels.csv
- *.h5f
Stack attempt data is in the hdf5 format and takes the following form:
YYYY-MM-DD-HH-MM-SS_example######.[success,failure,error.failure].h5f
Here are several specific filenames:
2018-05-29-15-00-28_example000001.success.h5f
2018-05-30-15-33-16_example000031.success.h5f
2018-06-04-10-01-33_example000001.error.failure.h5f
2018-06-04-10-01-33_example000001.error.failure.h5f
The yaml file contains the camera calibration:
rgb_PS1080_PrimeSense.yaml
Human labeling data, image presence, and notes on interesting examples:
rename_dataset_labels.csv
Every example can be loaded easily with h5py and all features has an equal number of frames collected at 10Hz, or 0.1s per frame. The number of frames will vary with each example, including examples with zero frames which are typically in the *.error.failure.h5f
examples.
Here is a complete list of features:
data["labels_to_name"][0]
is "grab_blue", then its corresponding integer index is 0.Actions are one-hot encoded, so there will be a list of values all set to 0 with length 41, the total number of actions. However, for the active action, the value at action_list[value_index] is 1. Here is the list of all action_index values and what each means:
00 'place_green_on_yellow'
01 'move_to_home'
02 'place_blue_on_yellowred'
03 'place_yellow_on_red'
04 'place_blue_on_red'
05 'grab_blue',
06 'place_red_on_blueyellow',
07 'place_green_on_redyellow',
08 'place_red_on_yellow',
09 'place_green_on_blueyellow',
10 'place_red_on_greenblue',
11 'place_blue_on_green',
12 'place_blue_on_redgreen',
13 'place_yellow_on_greenblue',
14 'place_yellow_on_blue',
15 'place_blue_on_greenyellow',
16 'place_blue_on_yellowgreen',
17 'place_blue_on_greenred',
18 'place_yellow_on_redgreen',
19 'grab_yellow',
20 'place_red_on_greenyellow',
21 'grab_green',
22 'place_red_on_green',
23 'place_yellow_on_bluered',
24 'place_yellow_on_green',
25 'place_green_on_blue',
26 'place_yellow_on_bluegreen',
27 'place_blue_on_redyellow',
28 'place_red_on_blue',
29 'place_red_on_yellowgreen',
30 'place_yellow_on_greenred',
31 'place_green_on_yellowblue',
32 'place_red_on_bluegreen',
33 'place_green_on_red',
34 'place_red_on_yellowblue',
35 'place_green_on_yellowred',
36 'place_green_on_redblue',
37 'grab_red',
38 'place_yellow_on_redblue',
39 'place_green_on_bluered',
40 'place_blue_on_yellow'
We highly recommend using the splits we have provided so comparisons can be made more accurately across papers. We expect to expand the dataset progressively between versions. You can visualize the proportions of each subset in the Dataset Summary section. Our chosen splits are defined in files named as follows:
costar_block_stacking_dataset_{Version}_{ObjectSubset}_{AttemptType}_{TVTSubset}_files.txt
The names in {curly brackets} above vary depending on the relevant subset. Each subset is described below.
Version:
0.4 at the time of writing. Version numbers may be updated due to changes in the dataset including both new data and corrections.
ObjectSubset:
blocks_only
- red, green, yellow, and blue 5.1cm wooden cubes are presentblocks_with_plush_toy
- blocks are present plus 24 colorful plush toy distractorscombined
- contains both the data from the blocks_only and blocks_with_plush_toy subsetsAttemptType:
success_only
error_failure_only
task_failure_only
task_and_error_failure
train
- for training a neural networkval
- for verifying the performance, though this subset may be used for optimizationtest
- for verification of final results on the final modelHere we load one attempt and show the images. We've tried to ensure everything is easy to work with:
Here is the output you should see after running this code:
For additional script information, please refer to the README file.
Data can be loaded with:
block_stacking_reader.py includes a standard python generator and the parent repositories include code which uses this loader with Keras and TensorFlow. While two Keras files are imported, these data loading files only depend on basic libraries and so won't make use of a gpu or interfere with other popular deep learning libraries such as pytorch.
Below we will write notes about the dataset & the answers to questions we have received.
All subsets are randomly ordered with the python or numpy shuffle function and a random seed of 0 before val and test subsets are selected from the front, and files are only included in a split if they are readable and contain at least one image. The success_only
+ blocks_only
dataset was selected well before the others with 128 attempts for each of the validation and test subsets. The success_only
+ blocks_with_plush_toy
dataset was chosen to have 64 attempts for each of the validation and test sets.
A few examples were mislabeled by the automatic labeling system in v0.2, so we hand labeled all of the data for v0.3. Fortunately the failure set had never been utilized and contained quite a few successes. Mislabeled successes which were actually failures were very rare. Once all attempt type names were corrected we moved files to the corresponding category. We also collected some additional plush toy data between v0.3 and v0.4.
We calculated the proportion between train
and val
for success_only
data in the blocks_only
and blocks_with_plush_toy
subsets; each proportion was used to determine the number of train
and val
files in error_failure_only
and task_failure_only
. Most importantly, we have ensured no files have ever mixed between train
, val
, and test
.
The files marked with error.failure
indicates a problem we cannot currently recover from, such as a security stop or a ROS planning error. About halfway through the dataset we saved the final error string on error.failure cases, hopefully that will assist in diagnosing/classifiying more detailed reasons for why failures occur.
The AR tag mount broke around 2018-05-15. Some of the dataset was collected with the tag shifting around. Starting with:
2018-05-17-16-39-30_example000001.success.h5f
We re-glued the AR tag so the hand-eye calibration is different. The hand-eye calibration is not regenerated since objects are still being correctly grasped at this point. However, at least some successes were being reported as failures after this point.
The RGB and depth data (in fact all data) is not always perfectly time synchronized, and in some cases many depth frames are missing. This is much more common in examples containing failures and errors *.failure.h5f
and *.error.failure.h5f
examples than it is in successes *.success.h5f
.
Some bugs were fixed during the collection process which improved the synchronization dramatically, so use the filenames to choose more recently collected data, try after 2018-05-17-16-39, if you need to minimize the synchronization errors.
The AR Tag mount on the gripper broke a second time, and we simply left it detached for collection until the first data in September. Unfortunately, the exact date has been lost. In general, we advise not making any assumptions about the precision of the AR tag position relative to the robot. It can vary by a few centimeters and vary noticeably in angle. It should only be used if very rough values are needed.
The hand eye calibration itself seems to remain OK.
The gripper failed on 2018-08-31 at the time 22:27:44, the example:
~/.keras/datasets/costar_block_stacking_dataset_v0.4/blocks_with_plush_toy/2018-08-31-22-27-44_example000003.error.failure.h5f
and examples between 2018-08-31-22-27-44
and 2018-09-01 have the gripper locked in the closed position, and data may or may not be recorded for the gripper state.
We repaired the connector and added a piece to minimize flexing of the wire, and changed where the gripper wire is attached. This means there will be an appearance change which may affect predictions.