DeepFashion2 Challenge

The DeepFashion2 challenge is based on DeepFashion1 and DeepFashion2, which are benchmark datasets proposed to study a wide spectrum of computer vision applications for fashion, including online shopping, personalized recommendation, and virtual try-on, etc . Current techniques are still far from being adopted in real applications. For instance, the accuracy and efficiency of retrieving clothes in numerous commercial images when given a user-taken photo still see a large room for improvement. Therefore, the topics of challenge are being extensively studied in computer vision community by many research groups in both academia and industry. Some challenges of fashion image understanding can be rooted in the gap between the recent benchmark and the practical scenario. For example, the existing largest fashion dataset, DeepFashion, has its own drawbacks such as single clothing item per image, sparse landmark and pose definition (every clothing category shares the same definition of 4 -8 keypoints), and no per-pixel mask annotation.

To address the above drawbacks, we present DeepFashion2, a large-scale benchmark with comprehensive tasks and annotations of fashion image understanding. It is a versatile benchmark of four tasks including clothes detection, landmark estimation, segmentation, and retrieval. It has 801K clothing items where each item has rich annotations such as style, scale, viewpoint, occlusion, bounding box, dense landmarks and masks. There are also 873K Commercial-Consumer clothes pairs. It is the most comprehensive benchmark of its kinds to date. Specifically, we host two of the four challenges included in DeepFashion2 dataset in this workshop:

  1. Landmark Estimation. In this task, there are 192K images for training, 32K images for validation and 63K images for test. We adopt the same evaluation metric employed in the cocodataset . Different from coco dataset, where only one category has key-points, a total of 294 landmarks of 13 categories in DeepFashion2 are presented. Besides the coordinates of 294 landmarks of a detected clothing item, its category should also be included in final prediction files. Landmark estimation competition is hosted in DeepFashion2 Challenge Track 1.
  2. Clothes Retrieval. In this challenge, we provide a more realistic setting: Instead of being provided the ground truth query clothing item, users should detect clothing items in images from consumers. For each detected clothing item, users need to submit the top-10 retrieved clothing items detected from shop images. Top-k retrieval accuracy is employed as the evaluation metric. We emphasize the retrieval performance while still consider the influence of detector. If a clothing item fails to be detected, this query item is counted as missed. In particular, we have 337K commercial-consumer clothes pairs in the training set. In the validation set, there are 10,844 consumer images with 12,377 query items, and 21,309 commercial images with 36,961 items in the gallery. In the test set, there are 20,681 consumer images with 23,390 query items, and 41,948 commercial images with 72,337 items in the gallery. Clothes retrieval competition is hosted in DeepFashion2 Challenge Track 2.

Final Results

The top three winning teams in each challenge are shown below.

Important Dates

  • Evaluation server for validation set begins: Feb 5th
  • Evaluation server for test set begins: Mar 10th
  • Test submission deadline: April 10th
  • Fact sheet deadline: April 30th
  • Final results announcement: May 10th


  • For questions regarding the DeepFashion2 challenge, please contact Yuying Ge .