CoSTAR Dataset

Stack blocks like a champion! The CoSTAR Block Stacking Dataset includes a real robot trying to stack colored children's blocks more than 10,000 times in a scene with challenging lighting and a movable bin obstacle which must be avoided. First check out the examples below with images ordered by time from left to right. Notice the vastly different lighting conditions, the presence of plush toy distractors, the stack of 3 or 4 blocks, the object wear, and various bin positions.

The first four rows show individual successful attempts to stack the blocks, while the final row shows a failure. Every row of images begins with a clear view of the scene on the left, followed by images of the robot at 5 consecutive goal poses where the gripper may open or close.

Dataset Summary

costar block stacking dataset website table

Data Collection Process

The data is collected using CoSTAR, a system designed so an end-user can quickly create powerful and reusable robot programs. It incorporates a broad range of capabilities and a rudimentary perception system based on ObjRecRANSAC.

In between actions, the robot will move out of the camera view for the system to estimate block poses. During this time the camera will stop recording. After the pose estimation, the robot will resume to its previous state to execute the next action. Therefore, in some situations, the objects in the scene and/or the gripper may appear to teleport in adjacent frames because the blocks may roll over after the camera stopped recording. In between stacking attempts, the robot will use its previously saved poses to try un-stacking the blocks for the next stacking attempt. The data collection system will restart itself if too many errors are encountered to mitigate most system state and planning related errors. A typical successful attempt will contain 20 seconds of data logged at 10Hz.

A detailed list of features recorded in our data set is in the "Using the Dataset" section. Since we expect other researchers will use methods substantially different from our own we intentionally recorded sensor data which might be useful for a wide variety of approaches to completing the stacking task.

A video of the data collection process is below. Note the sequence of actions that the robot takes: "save position - move out of sight - return to saved position" in the data collection process. Also, please be aware that there are sometimes time gaps in the video below. This is because the camera we used was both motion sensitive and also rolled over between files every 5 minutes or so. The failure rate with toy distractors is also very high, but this is expected because the algorithms used in collection were not trained with distractors present.

Using the Dataset

Please visit the "Using the Dataset" section for instructions.


Please visit the "Download" section for instructions.


If you use the dataset please cite our paper introducing it:

Attribution is one of the few requirements of our permissive dataset license which is linked below.

    author = {Andrew Hundt and 
              Varun Jain and 
              Chris Paxton and 
              Gregory D. Hager},
    title = "{Training Frankenstein's Creature to Stack: HyperTree Architecture Search}",
    journal = {ArXiv e-prints},
    archivePrefix = {arXiv},
    eprint = {1810.11714},
    year = 2018,
    month = Oct,
    url = {}


The dataset files are covered by the Creative Commons Attribution 4.0 International (CC BY 4.0) license.Code is licensed under the Apache 2.0 license (full text)