The CoSTAR Block Stacking Dataset:

Learning with Workspace Constraints

CoSTAR Block Stacking Dataset

We introduce the CoSTAR Block Stacking Dataset for the benchmarking of neural network models on robot data. See the dataset website for details.

Abstract— A robot can now grasp an object more effectively than ever before, but once it has the object what happens next? We show that a mild relaxation of the task and workspace constraints implicit in existing object grasping datasets can cause neural network based grasping algorithms to fail on even a simple block stacking task when executed under more realistic circumstances.

To address this, we introduce the JHU CoSTAR Block Stacking Dataset (BSD), where a robot interacts with 5.1 cm colored blocks to complete an order-fulfillment style block stacking task. It contains dynamic scenes and real time-series data in a less constrained environment than comparable datasets. There are nearly 12,000 stacking attempts and over 2 million frames of real data. We discuss the ways in which this dataset provides a valuable resource for a broad range of other topics of investigation.

We find that hand-designed neural networks that work on prior datasets do not generalize to this task. Thus, to establish a baseline for this dataset, we demonstrate an automated search of neural network based models using a novel multiple-input HyperTree MetaModel, and find a final model which makes reasonable 3D pose predictions for grasping and stacking on our dataset.

The CoSTAR BSD, code, and instructions are available at github.com/jhu-lcsr/costar_plan.

GitHub Code

Code for this dataset and neural network models is from our paper The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints:

  1. CoSTAR Dataset Loading Repository - PyTorch and TensorFlow dataset loader
  2. HyperTree Architecture Search Code — aka CoSTAR Hyper, TensorFlow
  3. rENAS: regression Efficient Neural Architecture Search Code — TensorFlow
  4. CoSTAR Objects - 3D Models of blocks, bin, and other objects

https://arxiv.org/abs/1810.11714

Cite


@article{hundt2019costar,
    title={The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints},
    author={Andrew Hundt and Varun Jain and Chia-Hung Lin and Chris Paxton and Gregory D. Hager},
    journal = {Intelligent Robots and Systems (IROS), 2019 IEEE International Conference on},
    year = 2019,
    url = {https://arxiv.org/abs/1810.11714}
}


The following is from the first version of the paper, which has since undergone a major update.

Abstract— We propose HyperTrees for the low cost automatic design of multiple-input neural network models. Much like how Dr. Frankenstein’s creature was assembled from pieces before he came to life in the eponymous book, HyperTrees combine parts of other architectures to optimize for a new problem domain. We compare HyperTrees to rENAS, our extension of Efficient Neural Architecture Search (ENAS) . To evaluate these architectures we introduce the CoSTAR Block Stacking Dataset for the benchmarking of neural network models. We utilize 5.1 cm colored blocks and introduce complexity with a stacking task, a bin providing wall obstacles, dramatic lighting variation, and object ambiguity in the depth space. We demonstrate HyperTrees and rENAS on this dataset by predicting full 3D poses semantically for the purpose of grasping and placing specific objects. Inputs to the network include RGB images, the current gripper pose, and the action to take. Predictions with our best model are accurate to within 30 degrees 90% of the time, 4cm 72% of the time, and have an average test error of 3.3cm and 12.6 degrees. The dataset contains more than 10,000 stacking attempts and 1 million frames of real data. Code and dataset instructions are available at github.com/cpaxton/costar_plan.