Visual object recognition plays an essential role in human daily life. This ability is so efficient that we can recognize a face or an object seemingly without effort, though they may vary in position, scale, pose, and illumination. In the field of computer vision, a large number of studies have been carried out to build a human-like object recognition system. Recently, deep neural networks have shown impressive progress in object classification performance, and have been reported to surpass humans. Yet there is still lack of thorough and fair comparison between humans and artificial recognition systems. While some studies consider artificially degraded images, human recognition performance on dataset widely used for deep neural networks has not been fully evaluated. The present paper carries out an extensive experiment to evaluate human classification accuracy on CIFAR10, a well-known dataset of natural images. This then allows for a fair comparison with the state-of-the-art deep neural networks. Our CIFAR10-based evaluations show very efficient object recognition of recent CNNs but, at the same time, prove that they are still far from human-level capability of generalization. Moreover, a detailed investigation using multiple levels of difficulty reveals that easy images for humans may not be easy for deep neural networks. Such images form a subset of CIFAR10 that can be employed to evaluate and improve future neural networks.
The human recognition performance is obtained from 60 subjects. It is compared with deep neural networks: LeNet, Network-in-Network (NiN), Residual Network (RN), Residual Network+Cutout Regularization (RNC), WideResNet+Cutout Regularization(WRNC).
This file contains the raw data of 60 subjects, each of them recognizes 1000 images of the CIFAR10 testing set. The first column of the data matrix represents the subjects' identifier, the second column the images (from the CIFAR10 testing set), the third column human recognition results, and the last column the ground truth.
For example, a row [2 2568 8 9] means subject 2 classifies image 2568 as category 8, meanwhile its true category is 9.
The easy subset (level of difficulty 1, which human subjects always recognize correctly) of the CIFAR10 testing set can be found here. This subset indicates the index of the easy images in the original CIFAR10 testing set. The index starts at 1. For example, a value 2 in this subset indicates the second image in the original CIFAR10 testing set.
More results and details are described in the following paper: