Proactive Multi-Camera Collaboration for 3D Human Pose Estimation

 Hai Ci*, Mickel Liu*, Xuehai Pan*, Fangwei Zhong, Yizhou Wang

*Equal Contribution

Peking University & Beijing Institute of General Artificial Intelligence (BIGAI) 

 International Conference on Learning Representations (ICLR) 2023

Motivations - Why Active and Mobile?

Dynamic Occlusions lead to failed pose estimations

Fixed-Cameras cannot mo-cap unconstrained target

A failure case for 5 fixed-camera baseline 

New Simulation Environment - UnrealPose

ICLR 2023 Paper 1361 Demo - UnrealPose.mp4


ICLR 2023 Paper 1361 - Visualization Tool.mp4

Dedicated Visualization Tool

Demo Videos

Ours (MAPPO + World Dynamics Learning + CTCR)

ICLR 2023 Paper 1361 Demo Video.mp4

3D Evaluation


2D+3D Evaluation in SchoolGym


2D+3D Evaluation in UrbanStreet

Baselines (Passive + Active)

ICLR 2023 Paper 1361 Demo - Fixed-Cameras Pentagon Formation.mp4

Fixed-Cameras Baseline (Pentagon Formation)

ICLR 2023 Paper 1361 Demo - MAPPO Baseline.mp4

MAPPO Baseline (No smoothing)

Collaborative Triangulation Contribution Reward (CTCR)

The CTCR is incentivized by the concept of Shapley Value. The r-function in the above equation is the accuracy of the triangulated human pose. In essence, CTCR measures the average weighted marginal contribution of a camera agent to every valid sub-formation that contains this agent. A sub-formation needs to have at least two cameras to be considered valid, since the problem definition requires at least two cameras to form a valid multi-view 3D triangulation. 

The main idea is that achieving the overall optimality needs to also account for the optimality of every possible sub-formation (S). In order for a camera agent to receive the highest reward possible, its current position and view must be optimal both in terms of its current formation and any sub-formation possible.

The figure below shows an example of calculating CTCR for every camera agent in a three-cameras team.

World Dynamics Learning (WDL) Objectives

Below shows the loss equations for these five WDL objectives: (Full psuedo-code refers to Apendix A)

Execution Pipeline




title={Proactive Multi-Camera Collaboration for 3D Human Pose Estimation},

author={Hai Ci and Mickel Liu and Xuehai Pan and fangwei zhong and Yizhou Wang},

booktitle={The Eleventh International Conference on Learning Representations },