Note on the prediction videos: for an n-step prediction result video, we reset the model input to the ground truth repeatedly after n frames. This means that for a 5-step prediction sequence, the model input is reset after every fifth time step. Accordingly, we reset the model input to the ground truth after each prediction step for a 1-step prediction sequence. If T-1 (T=total steps) is not evenly divisible by n, we discard the remaining time step data elements of that episode.
Baseline
AP (no f_interact)
AP (with f_interact)
Baseline
AP (no f_interact)
AP (with f_interact)
Baseline
AP (no f_interact)
AP (with f_interact)
Baseline
AP (no f_interact)
AP (with f_interact)
Baseline
AP (no f_interact)
AP (with f_interact)
Baseline
AP (no f_interact)
AP (with f_interact)
AP (with f_interact)
AP (with f_interact)