Datasets

As interest in applying machine learning to robotics in manufacturing applications grows, the lack of domain-relevant datasets to use for training or benchmarking becomes more evident. This page covers literature regarding datasets aimed at supporting machine learning. There are a limited number of datasets that include manufacturing objects or support manufacturing operations. However, there are datasets that support operations such as perception and grasping, which are relevant to manufacturing, even if the types of objects and intended applications published are for other domains, such as assistive or service. Although most datasets indexed in this document are meant to be used with robotic arms, they are not specific to manufacturing.

Mobility

Mobile robots are expanding their capabilities well beyond the traditional role of Automatic/Automated/Autonomous Guided Vehicles (A-UGV) as they attain greater onboard intelligence. No longer limited to transporting parts throughout a factory, they can also now bring robot arms to where they’re needed as mobile manipulators. Various data sets exist that are helping to allow these manufacturing robots to be more intelligent, especially in dynamic and unstructured environments. Some of these data sets are described below.

    • The MIT Robotics Dataset Repository2010

      • Known as Radish, this repository of datasets covers odometry, laser, sonar, and sensor data taken from real robots. You’ll also find environmental maps generated both by robots and by hand.

    • New College Vision and Laser Dataset: This dataset was gathered while traveling through a college and its adjoining parks. It is intended for the mobile robotics and vision research communities, and for those interested in 6-DoF navigation and mapping.

Grasping

Real-world Object Datasets

  • Daily Interactive Manipulation (DIM) Dataset (Huang and Sun 2019) – 2019

    • This robot locomotion dataset aims to teach robots daily interactive manipulations in changing environments. The dataset focuses on the position, orientation, force, and torque of objects manipulated in daily tasks.

    • The dataset consists of two parts. The first part contains 1,603 trials that cover 32 types of motions (fine motions that people commonly perform in daily life that involve interaction with a variety of objects). The second part contains the pouring motion alone. We collect it to help with motion generalization to different environments. The pouring data contain 1,596 trials of pouring 3 materials from 6 cups into 10 containers.

    • The dataset aims to provide PO and FT, but also provides RGB and depth vision data with a smaller coverage.

  • Stanford synthetic object grasping point data2006

    • Labeled training set, i.e., a set of images of objects labeled with the 2D location of the grasping point in each image.

    • Synthetic images along with correct grasp using a computer graphics ray tracer.

    • From the synthetic model of an object a large number of training examples were automatically generated by rendering the object under different (randomly chosen) lighting conditions, camera positions and orientations.

    • Datasets consist of (a) image, (b) grasp labels (binary 0-1 image), (c) depthmap (range image), (d) 6-dof grasping point, (e) object orientation, (f) grasping parameters such as gripper width.

  • Google Brain Robotics Data (Levine et al. 2016) – 2016

    • This robot locomotion repository focuses on the actions of robotic arms. The available datasets include grasping, pushing, pouring, and depth image encoding. To support these datasets, the repository provides a collection of procedurally generated random objects on which to train robot grasping and other tasks.

    • This data set contains roughly 650,000 examples of robot grasping attempts.

    • The examples are grouped into batches that have identical feature keys.

    • The set of features present in each batch is described in a CSV file.

  • Grasping dataset from the Robot Learning Lab (Depierre, Dellandréa, and Chen 2018) – 2009

    • Popular grasp dataset that was compiled for most transfer learning approaches in robotic grasping.

    • Created with grasp rectangle information for 280 different object types and it contains 1035 images.

    • Each image also has an associated point cloud and a background image. (A single background image is used for number object images).

    • The raw dataset consists of: (a) images, (b) grasping rectangles, (c) pointclouds, (d) background images and (e) a file containing a mapping from each image to the corresponding background image.

  • Washington RGB-D Datasets (Lai et al. 2011) – 2011

    • Rich variety of RGB-D images.

      • RGB-D Object Dataset: The dataset contains 300 objects (aka "instances") in 51 categories.

      • RGB-D Scenes Dataset v.2: The RGB-D Scenes Dataset v2 consists of 14 scenes containing furniture (chair, coffee table, sofa, table) and a subset of the objects in the RGB-D Object Dataset (bowls, caps, cereal boxes, coffee mugs, and soda cans). Each scene is a point cloud created by aligning a set of video frames using Patch Volumes Mapping.

      • RGB-D Scenes Dataset: This dataset contains 8 scenes annotated with objects that belong to the RGB-D Object Dataset. Each scene is a single video sequence consisting of multiple RGB-D frames.

  • Dataset from the article Learning to Grasp Without Seeing (Murali et al. 2018) – 2018

    • Tactile-sensing based approach for grasping novel objects without prior knowledge of their location or physical properties.

    • Grasping dataset: visual and haptic sensor data (30 RGB frames and over 2.8 million tactile samples from 7800 grasp interactions of 52 objects)

    • 30 RGB frames: images of four specific events of grasping for the initial scene, before, during and after grasp execution. These images have a resolution of 1280×960.

    • Haptic Measurements: Tactile signals are measured by force sensors mounted on each of the three fingers of the gripper. The sensor measures the magnitude (F) and the direction of forces (Fx, Fy, Fz) at 100 Hz.

    • Grasping Actions and Labels: Record the pose of all 2D planar grasps, including the initial grasp (x0, y0, z0, θ0) and subsequent re-grasps (xt, yt, zt, θt). Success of the grasp is also recorded.

    • Material Labels of Objects: We label material categories (7) for each object, including metal, hard plastic, elastic plastic, stuffed fabric, wood, glass and ceramic.

  • Smart Grasping Sandbox from Shadow Robot2019

    • Dataset for only grasping a ball.

    • The grasping dataset was obtained in simulation.

    • An experiment consists of grasping the ball, shaking it for a while, while computing a grasp robustness (which is the variation of the distance between the palm and the ball during the shake). Multiple measurements are taken during a given experiment.

    • Dataset consists of grasp quality based on each joint position, the torque and the velocity.

  • The RoboTurk Real Robot Dataset (Mandlekar et al. 2018) – 2018

    • This dataset by Stanford Vision and Learning is currently the largest dataset for robotic manipulation through remote teleoperation. The data was collected over one week with 54 operators, and includes 111 hours of robotic manipulation data on 3 challenging manipulation tasks. In particular, the data is helpful for tasks that require dexterous control and human planning.

      • Simulation dataset: A large-scale simulation dataset for the SawyerPickPlace and SawyerNutAssembly tasks were collected from the Surreal Robotics Suite using the RoboTurk platform. Crowdsourced workers collected these task demonstrations remotely. The dataset consists of 1070 successful SawyerPickPlace demonstrations and 1147 successful SawyerNutAssembly demonstrations.

      • Real dataset: A large-scale dataset on three different real world tasks were collected: Laundry Layout, Tower Creation, and Object Search. All three datasets were collected using the RoboTurk platform, collected by crowdsourced workers remotely. The dataset consists of 2144 different demonstrations from 54 unique users. This dataset consists of the complete dataset for training and a smaller sub-samples for exploration.

  • Dex-Net Datasets (Mahler et al. 2019) – 2019

    • Dexterity Network (Dex-Net) 4.0, a substantial extension to previous versions of Dex-Net that learns policies for a given set of grippers by training on synthetic datasets using domain randomization with analytic models of physics and geometry.

    • Experiments use a dataset of 75 objects chosen to reflect a diverse range of shapes, sizes, and material properties. The dataset is broken into three difficulty levels. Difficulty level 1 contains prismatic and circular solids. Difficulty level 2 contains common household objects with complex geometry including toys, tools, and "blisterpack" objects. Difficulty level 3 contains objects with adversarial geometry and material properties such as detailed 3D-printed industrial parts and deformable objects.

  • OpenLORIS Dataset (She et al. 2019) – 2020

    • The (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition Dataset (OpenLORIS-Object) is designed for accelerating the lifelong/continual/incremental learning research and application,currently focusing on improving the continuous learning capability of common objects in the home scenario. The dataset contains images with varying conditions that challenge vision models that are trained under static conditions. The OpenLORIS dataset is built via a robot that actively records videos of targeted household objects under multiple illuminations, occlusions, camera-object distances/angles, and context information (clutter). The sensors used are Intel RealSense D435i and T265 cameras.

    • The dataset contains 69 instances including 19 categories daily necessities objects under 7 scenes. The conditions are categorized by coarse, mostly qualitative categories of difficulty for the following:

      • Illumination: Strong, Normal, Weak

      • Occlusion: 0%, 25%, 50%

      • Clutter: Simple, Normal, Complex

      • Object size (pixels): > 200^2, 30^2 – 200^2, < 30^2

Perception

Perception capabilities are broadly supported with a wide range of datasets. The ones listed here are not exhaustive, but rather focus on ones that offer objects that could be considered relevant to manufacturing, with metallic, low-texture, highly-symmetric features.

Pose Estimation

Competitions and challenges are a common tool for comparing perception algorithm capabilities. Many benchmark datasets are available online. Most do not include objects that would be considered to have features representative of manufacturing parts.

  • T-LESS Dataset2017

    • Hodaň et al. (Hodan et al. 2017) present a comprehensive benchmark system for six degree of freedom pose estimation from RGB and RGB-D images. Datasets of objects are provided in a uniform format along with an evaluation methodology which includes online tools. Eight different datasets are provided, all of which have texture-mapped 3D object models and training and test RGB-D images annotated with ground-truth 6D object poses. The majority of the datasets contain household items, but one of them features industrial objects: T-LESS which have limited texture and discriminative color, as well as exhibiting symmetries and inter-object similarities which are typical of industrial parts. Data collected for T-LESS includes images from 3 synchronized sensors: Carmine 1.09, Kinect v2, and Canon IXUS 950 IS. For each sensor, 39,000 training images are collected from a systematically sampled view sphere for each object against a black background.

  • Falling Things (FAT) Dataset2018

    • Jonathanň et al. (Tremblay, To, and Birchfield 2018) has developed this synthetic dataset for 3D object detection and pose estimation. The objects in here are placed in three virtual environments namely kitchen, sun temple and forest. The objects present in here are taken from the YCB dataset. FAT has 61,500 images of 21 different household objects.

Anomaly Detection

  • The MVTec Anomaly Detection (MVTec AD) Dataset2019

    • This dataset is designed to train and evaluate systems that detect anomalies representative of real-world industrial inspection (Bergmann et al. 2019). The dataset contains over 5000 images of normal and defective parts acquired using a high-resolution industrial RGB sensor in combination with two bilateral telecentric lenses. The site includes ground truth and evaluation tools. A broad set of anomalies is included in the data, such as scratches, dents, and various structural changes. The available parts include a few metallic items, such as zipper, screw, and metal nut. The dataset covers a range of other materials and objects, like carpet, tile, capsules, pill, and toothbrush.

Object Recognition

  • MVTec ITODD Dataset2017

    • A dataset for 3D Object Recognition in Industry, includes 28 parts that are industrially-relevant, such as engine parts, brackets, screws, and clamps. As with the MVTEC Anomaly Detection site, in addition to the datasets, manually-annotated ground truth and evaluation tools are provided.
      Images are collected from two 3D sensors and the three grayscale cameras that are static and calibrated relative to each other. The objects are placed on a calibrated turntable. Ground truth is labeled using a semi-manual approach based on data of one of the 3D sensors.

  • The Awesome Robotics Datasets: Courtesy of Github user Sunlok Choi, this massive repository covers a wide range of datasets broken into the following categories: dataset collections, place-specific datasets, topic-specific datasets, and topic-specific datasets for computer vision. The sheer size of this repository makes it a great starting point for projects related to machine learning in robotics.

  • RoboNet Large-Scale Multi-Robot Learning Dataset2019

    • This dataset, by Berkeley Artificial Intelligence Research, contains 15 million video frames from robots interacting with different objects in a table-top setting. The stated goal of the dataset is "…to pre-train reinforcement learning models on a sufficiently diverse dataset and then transfer knowledge to a different test environment".

  • The Robot@Home Dataset2017

    • From the International Journal of Robotics Research, this computer vision dataset is for the semantic mapping of home environments. The dataset is a collection of raw and processed sensory data from domestic settings. It contains 87,000+ time stamped observations.

  • The DTU Robot Image Datasets2010, 2014, 2017

    • These two datasets of random objects were generated with a unique, experimental setup. One dataset is for evaluating point features, and one is for evaluating multiple view stereo. Because the setup is designed to avoid light pollution, the process allows for large amounts of high-quality data.

  • BigBIRD Dataset 2014

    • This dataset contains RGB-D point clouds, segmentation masks, pose information, and reconstructed meshes for 125 common household objects.

  • 3DNet Dataset 2012

    • 3DNet provides a large-scale hierarchical CAD-model databases with increasing numbers of classes and difficulty with 10, 60 and 200 object classes together with evaluation datasets that contain thousands of scenes captured with an RGB-D sensor.

  • ShapeNet 2012

    • Inspired by WordNet, ShapeNet is an ongoing effort to establish a richly-annotated, large-scale dataset of 3D shapes. The core dataset contains over 50,000 3D models spread across 55 common object categories.

Human Motion Recognition

  • Berkley Multimodal Human Action Database(MHAD)2013

    • Multimodal collection of 82 minutes consisted of 660 Action sequences taken from 12 subjects performing 11 tasks 5 times, including one manipulation task.

    • HMDB51

    • UT Kinect

    • ActivityNet

  • SURREAL Dataset2017

    • The name SURREAL stands for Synthetic hUmans foR Real tasks. It consists of 6.5 million frames with ground truth pose, depth maps and segmentation masks. This dataset finds application in human pose estimation, human motion recognition, and body parts segmentation. As suggested in (Varol et al. 2017), the images are rendered by using the 3D motion capture data and with the advent of modern tools the images rendered were realistic.

  • Procedural Human Action Videos2016

    • This dataset comprises of the human action videos synthetically generated from modern game engines using computer graphics. This dataset consist of 39982 videos with more than 100 examples with each action in 35 categories. The authors in (Souza et al. 2016) has shown that this dataset in addition with a small real-world datasets have outperformed the fine tuned state-of-the-art models.

General Purpose

  • YCB The Yale-CMU-Berkeley (YCB) Object and Model YCB Object and Model Set is designed for facilitating benchmarking in robotic manipulation. The set consists of objects of daily life with different shapes, sizes, textures, weight and rigidity, as well as some widely used manipulation tests. The physical objects are supplied to any research group who sign-up through the YCB website.

  • ImageNet ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node. We hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures.


  • ABC ABC-Dataset is a collection of one million Computer-Aided Design (CAD) models for research of geometric deep learning methods and applications. Each model is a collection of explicitly parameterized curves and surfaces, providing ground truth for differential quantities, patch segmentation, geometric feature detection, and shape reconstruction. Sampling the parametric descriptions of surfaces and curves allows generating data in different formats and resolutions, enabling fair comparisons for a wide range of geometric learning algorithms.

  • COCO The Common Objects in Context (COCO) is designed to represent a objects regularly encountered in everyday life. The COCO dataset is labeled, providing data to train supervised computer vision models that are able to identify the common objects in the dataset.