Specs on Faces (SoF) Dataset

Contents

  • Introduction
  • General information
  • Paper & Citation
  • Technical information
  • Download the database
  • Ground truth information
  • 5-fold cross validation
  • Contact us
  • Acknowledgement

Welcome to the Specs on Faces (SoF) dataset, a collection of 42,592 (2,662×16) images for 112 persons (66 males and 46 females) who wear glasses under different illumination conditions. The dataset is FREE for reasonable academic fair use. The dataset presents a new challenge regarding face detection and recognition. It is focused on two challenges: harsh illumination environments and face occlusions, which highly affect face detection, recognition, and classification. The glasses are the common natural occlusion in all images of the dataset. However, there are two more synthetic occlusions (nose and mouth) added to each image. Moreover, three image filters, that may evade face detectors and facial recognition systems, were applied to each image. All generated images are categorized into three levels of difficulty (easy, medium, and hard). That enlarges the number of images to be 42,592 images (26,112 male images and 16,480 female images). There is metadata for each image that contains many information such as: the subject ID, facial landmarks, face and glasses rectangles, gender and age labels, year that the photo was taken, facial emotion, glasses type, and more.

Introduction

In the literature, there are many cases that have shown how severe illumination conditions degrade the performance of both face detection and recognition, even though most face detection algorithms normalize the contrast of the input image as a preprocessing step. Also, face occlusions threaten the performance of many face detectors and facial recognition systems. For instance, Facebook's face detector is considered invariant against dark images, whereas the accuracy, at least at the time of this writing, goes down with people who wear glasses or scarves. As another example, the face detection accuracy of the Snapchat app, which supports augmented reality technology by adding visual effects on faces, at least at the time of this writing, is also influenced by wearing glasses and bad illumination conditions.

Screenshots of Snapchat app (version 9.39.5.0). The Snapchat’s face detector succeeds in the first two images of each row; however, the challenging conditions in the last image of each row foil it.

Thus, we propose the SoF dataset as a challenging dataset that can be used to evaluate face detection, recognition, and classification techniques. The synthetic images, which were generated after adding some image filters, are categorized into three levels of difficulty: easy, medium, and hard, representing the strength of the filters. The original set of images consists of two parts. The first part contains unconstrained frontal and near-frontal images of persons who wear glasses as an essential occlusion in the images. The second part contains face images that were captured under severe illumination conditions in a controlled environment. More details and results can be found below.

Paper & Citation

Download paper: https://arxiv.org/abs/1706.04277

If you use the SoF dataset, please cite as:

Mahmoud Afifi and Abdelrahman Abdelhamed, "AFIF4: Deep gender classification based on an AdaBoost-based fusion of isolated facial features and foggy faces". arXiv:1706.04277, arXiv 2017.

Bibtex

@article{afifi2017afif4,
  title={AFIF4: Deep Gender Classification based on AdaBoost-based Fusion of Isolated Facial Features and Foggy Faces},
  author={Afifi, Mahmoud and Abdelhamed, Abdelrahman},
  journal={arXiv preprint arXiv:1706.04277},
  year={2017}
}

General information

The SoF dataset was assembled to support testing and evaluation of face detection, recognition, and classification algorithms using standardized tests and procedures. The first version of the dataset was collected in April 2015 by capturing 242 images for 14 subjects who wear eyeglasses under a controlled environment. This set was updated by capturing 92 images for 15 students from the Egyptian E-Learning University (EELU), Egypt. After that, many volunteers participated in by sharing their photos to build up the first part of the dataset. The images were captured in different countries, such as Egypt, Canada, France, Germany, India, Japan, Kuwait, Malaysia, Taiwan, United Arab Emirates, and USA. The last image in this part was captured on October 2016. The second part of the dataset was filmed in September 2016 in the Multimedia laboratory, Assiut University, Egypt.

Technical information

As aforementioned, the original set of images of the SoF comes in two parts. The first part contains 757 frontal and near-frontal (640 x 480 pixels) images for 106 different persons whose head orientation approximately ±35° in yaw, pitch, and roll. Many subjects participated in the first part by recent photos and old photos that were captured for many years ago. Some images (242 images) in the first part were captured in a systematic way (i.e. same facial expressions in the same environment). The rest of the images are unconstrained images that were collected from many volunteers. Both indoor and outdoor lighting conditions are included in this part. The second part is directed to present a challenging set of images (1,905 original images) that were captured under acute lighting conditions. The subjects (12 persons) were filmed under a single lamp located in arbitrary locations to emit light rays in random directions. The video was converted into a sequence of (640 x 480 pixels) frames which were filtered manually to pick frames that differ from previously existing frames. Each frame is different in either the lighting conditions or the facial expressions.

Samples of the second part of the SoF dataset.

For each image of the original set, there are 15 images generated by the synthetic occlusions (6 extra images for each original one) and the image filters (9 extra images for each original one). The synthetic occlusion refers to nose and mouth occlusion using a white block. There are three main filters added which are: Gaussian noise, Gaussian blur, and image posterization using fuzzy logic (read the technical report of the image posterization technique). We categorize the generated image into three levels: easy, medium, and hard. To do that, we adopt the Viola-Jones algorithm as a predominant face detector. We try many values, in an incremental manner, for each synthetic occlusion and image filter; followed by testing the face detector. We assume that the face is successfully detected only if one of the bounding boxes has an Intersection-over-Union (IoU) ratio overlapping with the ground truth annotation above or equal to 50%. Hence, we pick the appropriate values for each category.

For more details, read the technical report of the SoF dataset and the paper.

Download the dataset


  • All images [whole images.rar | 1.69 GB]: Download

All images of the dataset as (640 x 480) JPG files.

  • Original images [original images.rar | 56.9 MB]: Download

The original images of the dataset as (640 x 480) JPG files.

  • Metadata [metadata.rar | 616 KB]: Download

More details about the metadata are found below.

  • Gender classification 5-fold cross validation file [5-folds.rar | 111 KB]: Download
  • Emotion recognition 5-fold cross validation [5-folds.zip | 182 KB]: Download
  • Face recognition 5-fold cross validation [5-folds.zip | 21 KB]: Download

As a benchmark dataset for comparison, we suggest reporting performance as 5-fold cross validation. More details are found below.

Ground truth information

The ground truth information can be obtained from the metadata files. The metadata describes each subject from different aspects for many research areas, such as face detection, gender classification, and facial feature extraction. For each image, the metadata describes the subject ID (which is the first three letters of the first name and the first letter of the last name), 17 facial landmarks, face and glasses rectangles, gender and age labels, year that the photo was taken and facial emotion label (4 basic emotions which are normal, happy, sad/angry/disgusted, and surprised/fearful). We support the glasses/face rectangles as a quick way to access the glasses/face regions, so the glasses/face rectangles are upright rectangles, regardless of the head orientation. To get more accurate face rectangles, it is recommended to use the facial landmarks instead. There is a variable, denoted by “glassesType”, which refers to the type of glasses (eyeglasses, semi-transparent sunglasses, opaque sunglasses, or others). There is a label called “illuminationQuality” that indicates whether the face is well-illuminated or captured under poor illumination conditions. The poor illumination means that there is at least one facial point, i.e. landmark, which is invisible due to the bad illumination conditions. In other words, the well-illuminated face is the face whose all non-occluded facial features are recognizable by naked-eye. In addition, there is a set of (yes/no) variables which are: cropped, frontal, estimated points, indoor/outdoor lighting, and head scarf.

Note: the glasses/face rectangles of approximately 110 images are inaccurate, because of the head pose angles; get the image filenames (userID_imageSequence_*) from here. For more details, read the technical report of the SoF dataset.

Examples of facial landmarks, face and glasses rectangle in some images.

5-fold cross validation

As a benchmark for comparison, we suggest reporting performance as 5-fold cross validation.

Gender classification and emotion recognition

Each fold is represented by a txt file that contains the filenames of all images in this fold. Note that, there are two groups of folds, the first one for the original dataset called fold1Original.txt, fold2Original.txt, … etc. The second group contains the folds of the whole dataset called fold1.txt, fold2.txt, … etc. For gender classification, all folds contain the same distribution of males and females. For emotion recognition, all folds contain the same distribution of facial emotions.

Face, eyeglasses, and facial landmark detection

For face, eyeglasses, and facial landmark detection, we suggest to use the whole dataset images to test your model. For training and testing, you can use gender classification 5-fold validation.

Face recognition

Each fold is represented by a txt file that contains the filenames of all images in this fold. Note that, there are two groups of folds, the first one for the original dataset called fold1Original.txt, fold2Original.txt, … etc. The second group contains the folds of the whole dataset called fold1.txt, fold2.txt, … etc. Each fold contains 10 images for each subject (original images ~ foldxOriginal.txt) and 50 images for each subject (whole dataset ~ foldx.txt). There are 12 subjects that were selected from the 112 persons in the dataset.

Contact us

Questions and comments can be sent to:

m.afifi[at]aun[dot]edu[dot]eg or mafifi[at]eecs[dot]yorku[dot]ca

Results

Face/eyeglasses Detection

According to the ground-truth bbox, the detected regions with an Intersection-Over-Union (IoU) score that exceeds 50% were accepted as a true-positive detection. Otherwise, the detected regions is considered false-positive detection.

The IoU is calculated by

where A is the ground-truth bbox ROI and B is the detected bbox.

  • Results obtained using the SoF dataset for testing pre-trained models.
Face Detection (Results)

[1] Viola, Paul, and Michael Jones. "Rapid object detection using a boosted cascade of simple features." In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1, pp. I-I. IEEE, 2001.

[2] Yu, Xiang, Junzhou Huang, Shaoting Zhang, Wang Yan, and Dimitris N. Metaxas. "Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1944-1951. 2013.

[3] Mahmoud Afifi, Marwa Nasser, Mostafa Korashy, Katherine Rohde, Aly Abdelrahim. "Can We Boost the Power of the Viola-Jones Face Detector Using Pre-processing? An Empirical Study." arXiv preprint arXiv:1709.07720, (2017).

Gender Classification

Gender Classification (Results)

[1] Huang, Gary B., Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Vol. 1, no. 2. Technical Report 07-49, University of Massachusetts, Amherst, 2007.

[2] Eidinger, Eran, Roee Enbar, and Tal Hassner. "Age and gender estimation of unfiltered faces." IEEE Transactions on Information Forensics and Security 9, no. 12 (2014): 2170-2179.

[3] Gallagher, Andrew C., and Tsuhan Chen. "Understanding images of groups of people." In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 256-263. IEEE, 2009.

[4] Phillips, P. Jonathon, Harry Wechsler, Jeffery Huang, and Patrick J. Rauss. "The FERET database and evaluation procedure for face-recognition algorithms." Image and vision computing 16, no. 5 (1998): 295-306.

[5] Afifi, Mahmoud, and Abdelhamed, Abdelrahman. "AFIF4: Deep Gender Classification based on AdaBoost-based Fusion of Isolated Facial Features and Foggy Faces." arXiv preprint arXiv:1706.04277 (2017).

We encourage researchers who have used our dataset to send us their published results to make it available in the webpage.

Acknowledgement

There are 343 images of the dataset were captured by Ali Hussien, Ebram K. William, Mostafa Korashy, so thanks for their effort. Thanks for the administrators of Faculty of Computers and Information who supported this work. Eventually, thanks for all volunteers who trusted us with their photos to accomplish this work.

© This page contains files that could be protected by copyright. They are provided here for reasonable academic fair use.