Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup

simulation_pixels_all_cams.mp4

Experiments in simulation

Evaluation of sparse catch pixels task (fixed policy) learned in simulation.

Left column: original resolution images.

Right column: down-sampled raw images that are given as input to the controller.

Top row: front camera.

Bottom row: side camera.

sawyer_bic_compilation.mp4

Experiments with the real robot

The video shows training the ball-in-a-cup task directly on a real robot: first some earlier training stages, and then behaviour after training was finished (about 8000 episodes, approximately 3 days of training).

This compilation demonstrates how robust the learned control policy is.

The learned control policy is robust to human intervention, recovering after the ball is repeatedly intercepted and when immediately recovering when the ball is removed from the cup.

The video also demonstrates that the robot has learned to use images from both cameras (84x84 pixels each) to close the control loop. Both cameras are alternately blocked, negatively impacting the performance. The robot quickly recovers after cameras are unblocked.

sawyer_bic_sparse_catch_pixels_10ep_clean.mp4

Performance evaluation with real robot

Evaluation of sparse catch pixels task (fixed policy) learned after 8,360 episodes.

This video demonstrates how consistent the learned control policy is.

The video shows 10 uncut consecutive episodes - we have also run 300 consecutive episodes and achieved 100% success with an average catch time of 2.39 seconds.