Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup
Experiments in simulation
Evaluation of sparse catch pixels task (fixed policy) learned in simulation.
Left column: original resolution images.
Right column: down-sampled raw images that are given as input to the controller.
Top row: front camera.
Bottom row: side camera.
Experiments with the real robot
The video shows training the ball-in-a-cup task directly on a real robot: first some earlier training stages, and then behaviour after training was finished (about 8000 episodes, approximately 3 days of training).
This compilation demonstrates how robust the learned control policy is.
The learned control policy is robust to human intervention, recovering after the ball is repeatedly intercepted and when immediately recovering when the ball is removed from the cup.
The video also demonstrates that the robot has learned to use images from both cameras (84x84 pixels each) to close the control loop. Both cameras are alternately blocked, negatively impacting the performance. The robot quickly recovers after cameras are unblocked.
Performance evaluation with real robot
Evaluation of sparse catch pixels task (fixed policy) learned after 8,360 episodes.
This video demonstrates how consistent the learned control policy is.
The video shows 10 uncut consecutive episodes - we have also run 300 consecutive episodes and achieved 100% success with an average catch time of 2.39 seconds.