Ablation Study 1: Monoflex The 3D key points regressed is hyper parameter that could be optimized. 8, 14, 22, 89 key points are regressed and the AP is compared in the following table4. Depth follows the same trend. Clearly performance plateaus after 22 key points. Additionally different losses described are compared and a combination of Inverse sigmoid loss and L1 loss yielded best results.
Ablation Study 2: MViT Initially L1 loss is used for regression, which yielded very bad performance. After using RMSE and SSID losses, depths and regressed for every 2 and 4 frames and the results are shown below. Clearly the model was able to predict velocities better at a 2 frame interval compared to 4 frame interval as shown in table5