Bilevel Online Adaptation for Out-of-Domain Human Mesh Reconstruction
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China
Overview
A high-definition video that briefly interprets our paper is shown below. Specifically, we first illustrated the problem setup and the motivation (What the task do we want to do and what are the problems). Then we introduced the proposed solution, Bilevel Online Adaptation. Next, we exhibited the results we achieved. In the end, we compared with the state-of-the-art method VIBE to verify the advantages of our method in a new test domain.
HD Video with a full demonstration. It has more clear details compared with the compressed video of supplementary materials.
Problem
What problem does this paper solve? We aim to solve the domain shift problem. Although performing well in the training domain, most previous methods based on the SMPL model underperform in new domains with unexpected, domain-specific attributes, such as camera parameters, limb ratios, and backgrounds.
Visualization of the training set and the test set. There are various domain gaps between the training set and the test set, such as occlusions, environment (indoor vs. in-the-wild).
A reconstruction sample on the test set. The domain gaps lead to an inaccurate result with inconsistent estimations of camera and global orientation. Note that the result is produced by a model trained on Human3.6M (the training set).
Solution
How to tackle the above problem? We propose an algorithm named Bilevel Online Adaptation (BOA). The goal of our approach is to fine-tune a model on test instances with carefully-designed, unsupervised constraints, such that it can greatly mitigate the domain gap mentioned above. Particularly, we introduce an additional loss function on temporal consistency. Furthermore, to avoid conflict and get benefit from training towards multiple objectives, we integrate bilevel optimization in the online adaptation framework.
Framework: For each arrived streaming data, the BOA framework updates the model at the last step twice: (1) lower-level weight probe that serves as a probe to rational parameters under frame-wise pose constraints, (2) upper-level weight update that based on the obtained lower-level parameters to seek a feasible solution to the upper-level optimization of the overall multi-objectives in space-time.
Results
Visualization Results: We present three samples in various scenarios. By using the proposed BOA, we are able to get accurate and consistent reconstructions of streaming data in new domains.
Sample 1
Sample 2
Sample 3
Comparison with VIBE: The reconstructed meshes from BOA (ours) are more accurate with fewer shaking artifacts and perform better when the lower-body is occluded. Note that VIBE uses more training data from both indoor and in-the-wild scenarios.