Face Tracking Project

Tracking Persons-of-Interest via Unsupervised Representation Adaptation

Shun Zhang1, Jia-Bin Huang2, Jongwoo Lim3, Yihong Gong4, Jinjun Wang4, Narendra Ahuja5 and Ming-Hsuan Yang6

2Northwestern Polytechnical University, 2Virginia Tech, 3Hanyang University, 4Xi'an Jiaotong University, 5University of Illinois, Urbana-Champaign, 6University of California, Merced

Abstract

Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Existing multi-target tracking methods often use low-level features which are not sufficiently discriminative for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-specific face representations using convolutional neural networks (CNNs). Unlike existing CNN-based approaches which are only trained on large-scale face image datasets offline, we use the contextual constraints to generate a large number of training samples for a given video, and further adapt the pre-trained face CNN to specific videos using discovered training samples. Using these training samples, we optimize the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity via minimizing a triplet loss function. With the learned discriminative features, we apply the hierarchical clustering algorithm to link tracklets across multiple shots to generate trajectories. We extensively evaluate the proposed algorithm on two sets of TV sitcoms and YouTube music videos, analyze the contribution of each component, and demonstrate significant performance improvement over existing techniques.

Multi-face Tracking

We tackle the problem of tracking multiple faces of people while maintaining their identities in unconstrained videos. Such videos consist of many shots from different cameras. The main challenge is to address large appearance variations of faces from different shots due to changes in pose, view angle, scale, makeup, illumination, camera motion and heavy occlusions.

Algorithm Outline

Our multi-face tracking algorithm has four main steps: (a) Pre-training a CNN on a large-scale face recognition dataset to learn identity-preserving features, (b) Generating face pairs or face triplets from the tracklets in a specific video with the proposed spatio-temporal constraints and contextual constraints, (c) Adapting the pre-trained CNN to learn video-specific features from the automatically generated training samples, and (d) Linking tracklets within each shot and then across shots to form the face trajectories.

Contextual Constraints

Here, we label the faces in T1 and T3 as the same identity given the sufficiently high similarity between the contextual features of T1 and T3. With this additional constraint, we can propagate the constraints transitively and derive that the faces from T1 and T4 (or T5, T6}) are in fact belong to different identities, and the faces from T3 and T2 are from different people.

Clustering Performance

The clustering purity versus the number of clusters in comparison with different features on YouTube music video, Big Bang Theory and BUFFY datasets. The ideal line indicates that all faces are correctly grouped into ideal clusters, and its corresponding weighted purity is equal to 1. For the more effective feature, its purity approximates to 1 faster with the increase in the number of clusters. The legend contains the purities at the ideal number of clusters for each feature.

Tara

Pussycat Dolls

Bruno Mars

Hello Bubble

Darling

Apink

Westlife

Girls Aloud

BUFFY02

BUFFY05

BUFFY06

BBT01

BBT02

BBT03

BBT04

BBT05

BBT06

BBT07

Papers, Supplementary materials, Codes, and Datasets

Shun Zhang, Jia-Bin Huang, Jongwoo Lim, Yihong Gong, Jinjun Wang, Narendra Ahuja and Ming-Hsuan Yang, "Tracking Persons-of-Interest via Unsupervised Representation Adaptation", accepted in IJCV 2019. [paper] [supp] [video_demo] [data: Dropbox or BaiduYun] [Code coming soon]

Last updated: Aug. 16, 2019