End-to-end Face Detection and Cast Grouping in Movies Using Erdős–Rényi Clustering

Demo

This demo video shows the face detection and clustering for the first few minutes of the Hannah data set video. In the data set, many scenes contain a large number of faces, and in addition, there are a large number of occlusions, providing a challenging scenario for the face detection and clustering task. Each box color represents detections with a different cluster ID. The cluster IDs are written in the corner of the face bounding box.

Abstract
We present an end-to-end system for detecting  and clustering faces by identity in full-length  movies. Unlike works that start with a predefined set of detected faces, we consider the end-to-end problem of detection and clustering together. We make three separate contributions. First, we combine a state-of-the-art face detector with a generic tracker to extract high quality face tracklets. We then introduce a novel clustering method, motivated by the classic graph theory results of Erdős–Rényi. It is based on the observations that  large clusters can be fully connected by joining just a small fraction of their point pairs, while just a single connection between two different people can lead to poor clustering results. This suggests clustering using a verification system with very few false positives but perhaps moderate recall. We introduce a novel verification method, rank-1 counts verification, that has this property, and use it in a link-based clustering scheme. Finally, we define a novel end-to-end detection and clustering evaluation metric allowing us to assess the accuracy of the entire end-to-end system.  We present state-of-the-art results on multiple video data sets and also on standard face databases.



Publications
SouYoung JinHang SuChris Stauffer, and Erik Learned-Miller.  End-to-end face detection and cast grouping in movies using Erdos-Renyi clustering. International Conference on Computer Vision (ICCV), 10 pages, 2017. spotlight [pdf] [supp]



Acknowledgement
This research is based in part upon work supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA) under contract number 2014-14071600010. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, or the U.S. Government.  The U.S. Government is authorized to reproduce and distribute reprints for Governmental purpose notwithstanding any copyright annotation thereon.