Real Quadrotor

This task aims to infer the action sequence of a quadrotor from a real-world video of its dynamic motion. Below we show three results: one from our method (RISP) and two from the state-of-the-art approach (GradSim) with variations. Each result is a side-by-side video comparison between the reconstructed motion (left) with the inferred action sequence and the input video (right, same for all three results).

RISP Result

Our method reconstructs a similar dynamic motion as the input video without knowing the exact rendering configuration. The lighting and material configurations that RISP uses (left) are intentionally chosen to be different from the input video.

GradSim Result

The setup is identical to RISP except that the loss function is replaced with the pixel-wise difference defined in the original GradSim paper. The reconstructed motion fails to resemble the motion in the input video due to the substantially different rendering configuration.

GradSim-Enhanced Result

We improved the performance of GradSim by 1) manually tuning the renderer's configuration to match the video input and 2) providing a good initial guess of the action sequence computed based on the ground-truth trajectory recorded from a motion capture system. Note that such a good initial guess from motion capture data is not used in RISP and GradSim results above and is generally inaccessible in real-world applications. With this additional help, GradSim manages to mimic the motion at the beginning of the video but still fails to replicate the full trajectory reliably.