EVA-GCN:

Head Pose Estimation Based on Graph Convolutional Networks

Miao Xin* (Institute of Automation (CASIA), Chinese Academy of Sciences)

Shentong Mo (Carnegie Mellon University)

Yuanze Lin (Beihang University)

Head pose estimation is an important task in many real-world applications. Since the facial landmarks usually serve as the common input that is shared by multiple downstream tasks, utilizing landmarks to acquire high-precision head pose estimation is of practical value for many real-world applications. However, existing landmark-based methods have a major drawback in model expressive power, making them hard to achieve comparable performance to the landmark-free methods. In this paper, we propose a strong baseline method which views the head pose estimation as a graph regression problem. We construct a landmark-connection graph, and propose to leverage the Graph Convolutional Networks (GCN) to model the complex nonlinear mappings between the graph typologies and the head pose angles. Specifically, we design a novel GCN architecture which utilizes joint Edge-Vertex Attention (EVA) mechanism to overcome the unstable landmark detection. Moreover, we introduce the Adaptive Channel Attention (ACA) and the Densely-Connected Architecture (DCA) to boost the performance further. We evaluate the proposed method on three challenging benchmark datasets. Experiment results demonstrate that our method achieves better performance in comparison with the state-of-the-art landmark-based and landmark-free methods. The source code will be made publicly available.

Introduction

Existing landmark-based head pose estimation methods can not present equivalent performance with the state-of-the-art landmark-free methods. The model expressive power is argued to be the main reason. The principal of landmark-based methods is to achieve the 3D angle information according to the landmark distribution. Therefore, it is crucial to model the complex nonlinear relationships between the geometric distribution of landmarks and head poses robustly and efficiently. However, current methods are lack of the corresponding designs to fulfill such objective, resulting in the current performance bottleneck. Hence, it is natural to wonder that, can dedicated models designed specifically for landmark-based head pose estimation improve accuracy further? To answer this question, we provide a strong baseline method.

In this work, we propose to leverage the graph convolutional networks (GCN) to improve the performance of the landmark-based head pose estimation. We propose a landmark-connection graph which takes the selected facial landmarks as the vertexes, and connect them via the k-Nearest Neighbor method. We utilize the spatial GCN to regress three directions of pose angles. Specifically, we introduce the joint edge-vertex attention, the automatic channel attention and the densely-connected architecture in the graph convolutional networks. These designs boost the performance significantly.

Our main contributions can be summarized as follows:

We propose a graph convolutional network architecture which regresses the 3D head pose angle. To the best of our knowledge, this is the first method that introduces GCN into the head pose estimation.
We propose joint edge-vertex attention mechanism into the vanilla GCN architecture, forming a strong baseline. Furthermore, we introduce the adaptive channel attention and the deeply-connected architecture into the model, improving the performance significantly.
We evaluate the proposed method comprehensively on three challenging datasets. Our method achieves the state-of-the-art performance within the landmarkbased methods and outperforms the current published landmark-free methods. We also provide the detailed ablation analysis, result discussions, and the theoretical performance bound of our method in the paper.