More Results

Each row represents a video demonstration and the affordance prediction produced by our model, including the predicted interactable region and the action label (Upper: Predicted, Lower: Ground Truth). The videos might take some time to load.

out_img10.mp4
out_img102.mp4
out_img107.mp4
out_img111.mp4
out_img114.mp4
out_img115.mp4
out_img0.mp4
out_img185.mp4
out_img129.mp4
out_img133.mp4
out_img144.mp4
out_img147.mp4
out_img151.mp4
out_img17.mp4
out_img178.mp4
out_img18.mp4
out_img190.mp4
out_img20.mp4
out_img201.mp4
out_img213.mp4
out_img219.mp4
out_img223.mp4
out_img235.mp4
out_img249.mp4
out_img25.mp4
out_img257.mp4
out_img265.mp4
out_img27.mp4
out_img273.mp4
out_img277.mp4
out_img279.mp4
out_img283.mp4
out_img292.mp4
out_img30.mp4
out_img31.mp4
out_img32.mp4
out_img323.mp4
out_img343.mp4
out_img347.mp4
out_img414.mp4
out_img422.mp4
out_img429.mp4
out_img43.mp4
out_img438.mp4
out_img444.mp4
out_img447.mp4
out_img465.mp4
out_img47.mp4
out_img48.mp4
out_img485.mp4
out_img487.mp4
out_img492.mp4
out_img5.mp4
out_img508.mp4
out_img521.mp4
out_img522.mp4
out_img53.mp4
out_img559.mp4
out_img56.mp4
out_img572.mp4
out_img605.mp4
out_img617.mp4