IMUSIC: IMU-based Facial Expression Capture
Youjia Wang1,2 Yiwen Wu1,2 Ruiqian Li1 Hengan Zhou1,2 Hongyang Lin1,3
Yingwenqi Jiang1 Yingsheng Zhu1 Guanpeng Long1,4 Jingya Wang1 Lan Xu1 Jingyi Yu1
1ShanghaiTech University 2LumiAni Technology 3Deemos Technology 4ElanTech Co., Ltd
Abstract
For facial motion capture and analysis, the dominated solutions are generally based on visual cues, which cannot protect privacy and are vulnerable to occlusions. Inertial measurement units (IMUs) serve as potential rescues yet are mainly adopted for full-body motion capture. In this paper, we propose IMUSIC to fill the gap, a novel path for facial expression capture using purely IMU signals, significantly distant from previous visual solutions.The key design in our IMUSIC is a trilogy. We first design micro-IMUs to suit facial capture, companion with an anatomy-driven IMU placement scheme. Then, we contribute a novel IMU-ARKit dataset, which provides rich paired IMU/visual signals for diverse facial expressions and performances. Such unique multi-modality brings huge potential for future directions like IMU-based facial behavior analysis. Moreover, utilizing IMU-ARKit, we introduce a strong baseline approach to accurately predict facial blendshape parameters from purely IMU signals. Specifically, we tailor a Transformer diffusion model with a two-stage training strategy for this novel tracking task. The IMUSIC framework empowers us to perform accurate facial capture in scenarios where visual methods falter and simultaneously safeguard user privacy. We conduct extensive experiments about both the IMU configuration and technical components to validate the effectiveness of our IMUSIC approach. Notably, IMUSIC enables various potential and novel applications, i.e., privacy-protecting facial capture, hybrid capture against occlusions, or detecting minute facial movements that are often invisible through visual cues. We will release our dataset and implementations to enrich more possibilities of facial capture and analysis in our community.
Overview
We first introduce the hardware design and the data acquisition pipeline. Subsequently, we delve into the data calibration process and the methodology for facial motion recovery utilizing IMU signals. Following the deployment of IMUSIC, we demonstrate its effectiveness through various applications, underlining its precision and portability.