ARDOP Robotic Arm Manipulation

A Deep RL approach to Robotic Grasping using Random Image features

Reinforcement Learning is a fast-growing field that has accomplished much in recent years. From using specialized policy representations and imitation of human behavior to generate generalized policies using Neural Networks as non-linear function approximators, much work is done to learn policies from scratch and help robots manipulate tasks. Several stateof-the-art approaches have used efficient representation and feature learning methods with embedded control policies to achieve surprisingly good performances in manipulation tasks. In this study, we propose an approach for training an agent to pick up an arbitrary object through a robotic arm. The proposed approach has been demonstrated to show surprisingly good performance with Proximal Policy Optimization (PPO) trained on the feature vectors and random image features as a feature learning method, and further, achieve an accuracy of around 70%. Our results indicate that a simple method of including random image features with popular algorithms such as PPO, can provide a surprising improvement in the performance of the task of picking up getting a stable grip on) the arbitrary object. This study also investigates the challenges that accompany various other approaches that revolve on the application of deep reinforcement learning in robotic grasping tasks and empirically conclude that using a pre-trained model on an easier environment can be a promising strategy to augment training data for such tasks.

Paper : Under Revision at Robotics and Autonomous Systems , Elsevier Journals [here]

Author : Ashutosh Tiwari , N Sandeep Varma , Harish V M

Results

The initial proof of concept model comprised of the the 3D Pybullet Gym robotic arm (KUKA). It has a robotic arm, fixed on a table, with the task of picking up (getting stable grip) on an arbitrary object from a bin. The robot is equipped with a camera that provides RGB-D images.The reward is binary and is provided at the last step, with r(s(t); a(t))= 1, when the arm picks up the object(successful grasp) and 0 for a failed grasp. At t, the current timestep, s(t), the observed state, comprises of the current RGB-D image from the robot’s camera viewpoint. The arm moves via position control of the vertically-oriented gripper. The gripper automatically closes when it moves below a fixed height threshold, and the episode ends. As each new episode begins, the object positions and rotations are randomized within the bin.

As a baseline setup , we trained an agent for 50,000 episodes through DDPG using only the feature vector as input. This yielded relatively poor results, achieving only 10-15% accuracy in the environment.

Continous action spaces have an infinite number of possible actions for the agent, but there could be a way to 'guide' the model in the right direction. Used the pre-trained DDQN and prediction value as the feature vector, concatenated and fed as input to DDPG. This slightly increased the accuracy and performance. Training only on the feature vector and PPO gave an accuracy of 54%.

In the end, using random features concatenated with original feature vector we were able to achieve more than 70% accuracy when trained with PPO.

Plots to the left show the average reward for training PPO with only feature vector as input and with random feature as input trained at 50000 episodes. Plots to the right show average rewards for DDPG with only feature vectors as input vs DDQN as input trained at 50000 episodes.

An important inference from the research results showed that using a pre-trained model in an easier environment augments training data and helps in increasing training speed

Extension

ARDOP 3.0 Humanoid Robot with 2 6DoF Arms

Coordinate frames for ARDOP 3.0

Rviz and gazebo simulations

ARDOP 3.0 is an integration of two systems namely: The perception and manipulation system. The kinect camera is central to the perception system and is used to perform tasks of object detection and localization of objects in camera frame. The manipulation system consists of two 6DOF arms that are actuated through servo motors, the manipulation system performs the task of object manipulation, it comprises of a kinematic solver and a trajectory planner. The mechanical frame is simple but strong and robust, it is built using aluminium and power-coated. The accuracy of the kinematic and perception system is tested through the pick and place experiment.