Bilevel Online Adaptation for Out-of-Domain Human Mesh Reconstruction

Shanyan Guan*, Jingwei Xu*, Yunbo Wang, Bingbing Ni, Xiaokang Yang

MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China

Overview

A high-definition video that briefly interprets our paper is shown below. Specifically, we first illustrated the problem setup and the motivation (What the task do we want to do and what are the problems). Then we introduced the proposed solution, Bilevel Online Adaptation. Next, we exhibited the results we achieved. In the end, we compared with the state-of-the-art method VIBE to verify the advantages of our method in a new test domain.

supp-novoice.mp4

HD Video with a full demonstration. It has more clear details compared with the compressed video of supplementary materials.

Problem

What problem does this paper solve? We aim to solve the domain shift problem. Although performing well in the training domain, most previous methods based on the SMPL model underperform in new domains with unexpected, domain-specific attributes, such as camera parameters, limb ratios, and backgrounds.

Visualization of the training set and the test set. There are various domain gaps between the training set and the test set, such as occlusions, environment (indoor vs. in-the-wild).

A reconstruction sample on the test set. The domain gaps lead to an inaccurate result with inconsistent estimations of camera and global orientation. Note that the result is produced by a model trained on Human3.6M (the training set).

Solution

How to tackle the above problem? We propose an algorithm named Bilevel Online Adaptation (BOA). The goal of our approach is to fine-tune a model on test instances with carefully-designed, unsupervised constraints, such that it can greatly mitigate the domain gap mentioned above. Particularly, we introduce an additional loss function on temporal consistency. Furthermore, to avoid conflict and get benefit from training towards multiple objectives, we integrate bilevel optimization in the online adaptation framework.

Framework: For each arrived streaming data, the BOA framework updates the model at the last step twice: (1) lower-level weight probe that serves as a probe to rational parameters under frame-wise pose constraints, (2) upper-level weight update that based on the obtained lower-level parameters to seek a feasible solution to the upper-level optimization of the overall multi-objectives in space-time.

Results

Visualization Results: We present three samples in various scenarios. By using the proposed BOA, we are able to get accurate and consistent reconstructions of streaming data in new domains.

clip1.mp4

Sample 1

clip5.mp4

Sample 2

clip3.mp4

Sample 3

Comparison with VIBE: The reconstructed meshes from BOA (ours) are more accurate with fewer shaking artifacts and perform better when the lower-body is occluded. Note that VIBE uses more training data from both indoor and in-the-wild scenarios.

boa-vibe.mp4