COCO-Crowd Dataset

COCO-Crowd dataset is a subset of COCO 2014 train/val dataset, composed of images having overlap between instances. Since COCO dataset has non-overlapping instance bias (check the pie chart below), we curated a subset of COCO, containing scenes with overlapping people by cropping the bounding boxes from images. This dataset contains images at least two instances overlap, and is intended to focus on the multi-person pose estimation on socially interacting scenarios.

COCO image (a)

Extracted COCO-Crowd images (b)

COCO-Crowd extraction step is as follows: We iterate over all possible pairs of bounding boxes containing a person in each image. If a pair of boxes have IoU≥0.1, then we tag that pair of boxes as a crowd. After all crowd pairs are obtained, we merge all pairs that share at least one common instance into sets.

The left image (a) shows an original COCO image and annotations, and the two images on the right (b) show the corresponding images in COCO-crowd. We define the notion of “overlap” when the intersection of union (IoU) score of two bounding boxes is greater than 0.1 ([1,2], [3,4], and [4,5] pairs on left image). All interlinked boxes ([1,2] and [3,4,5]) are merged into proposals for the crowd region.

Dataset Configuration

(a) Data distribution on the original COCO dataset
(b) Number of overlapping people vs number of images
(c) Number of overlaps arisen from each instance vs number of instances

After running the crowd extraction system on COCO, we observe that single-instances (non-overlapping instances) are dominant on the dataset. To overcome this dataset bias, our dataset (COCO-crowd) consists of images with two or more instances, where its distribution can be seen in (b). To show the complexity of the our curated dataset, we count the number of overlaps arisen from each instance, and provide the distribution in (c).

  • Total number of images: 14,003 (Instances: 35,148)
  • Validation/Testing images: 3,336/3,336

Data Format

Original COCO dataset can be downloaded here. Our work used original COCO images for data augmentation (crowd region with scaling), so we provide crowd region and corresponding keypoint annotations (on the original coco image) with json format. COCO-Crowd annotation can be downloaded here. Details are shown below.

  • Training annotations:
      • ‘crowd_box’: crowd bounding box from the original image
          • x0, y0, x1, y1: top-left and bottom-right coordinates
      • ‘filename’: image file name
      • ‘annotations’:
          • ‘keypoints’: keypoint locations on original COCO image
              • Keypoint order is the same as coco
              • 0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5:'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
          • ‘id’: instance id
          • ‘bbox’: instance bounding box location in the original image
              • x0, y0, width, height
          • 'image_id': image id
      • 'non_crowd': Annotations of non-crowd instances in the original COCO image
          • 'keypoints'
          • 'id'
          • 'bbox'
          • 'image_id'

For testing purpose, we provide extracted validation/testing images here. Note that keypoint locations for training images share the coordinate space with the original COCO image, but the keypoint locations for testing/validation images are given with respect to the cropped COCO-Crowd images. New instance and image ids are assigned since multiple crops can be extracted from a single COCO image.

  • Validation/Testing annotations:
      • 'image_id': new id assigned to the coco-crowd image
      • 'image_file': image file name
      • 'annotations'
          • 'keypoints': keypoint location on coco-crowd image
          • 'iscrowd': crowd label for instance (same as original coco)
          • 'id': new id assigned to the coco-crowd instance
          • 'bbox': bounding box location
              • x0, y0, width, height
          • 'area'

Evaluation

We follow the same evaluation metric as the original coco evaluation, however, we provide separate evaluation code built on the original evaluation code due to slight different format of our dataset.

  • Testing format (json)
      • 'image_id': int,
      • 'score': float,
      • 'keypoints': [x0,y0,v0, x1,y1,v1, ... ]

Related Publication:

      • Serim Ryou, Pietro Perona. "Parsing Pose of People with Interaction." BMVC 2018. [pdf] [supp]
    @inproceedings{DBLP:conf/bmvc/RyouP18,
      author    = {Serim Ryou and
                   Pietro Perona},
      title     = {Parsing Pose of People with Interaction},
      booktitle = {British Machine Vision Conference 2018, {BMVC} 2018, Northumbria University,
                   Newcastle, UK, September 3-6, 2018},
      pages     = {216},
      year      = {2018},
      crossref  = {DBLP:conf/bmvc/2018},
      url       = {http://bmvc2018.org/contents/papers/0679.pdf},
      timestamp = {Mon, 17 Sep 2018 15:40:26 +0200},
      biburl    = {https://dblp.org/rec/bib/conf/bmvc/RyouP18},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    }