AI-based robotic grasp
Robotic grasp using neural networks and several stochastic machine learning
- Research Background
Logistics systems cover a wide variety of objects. Therefore, technology that can deal with many objects is needed. However, because of lack of object manipulation technology, many workers are needed for transportation, packaging, and other tasks.
- Research Objectives
Robot deals with various types of objects using artificial intelligence techniques.
- Research Output
1. Object detection
Recognizing objects using Mask R-CNN on given RGB camera images.
* Mask R-CNN
1) Bounding box detection based on Faster R-CNN and segmented objects
2) Using ResNet and RPN (Region Proposal Network)
3) ResNet to extract the feature map of given RGB camera images.
4) RPN (Region Proposal Network) to find the bounding boxes of objects
< Example of Mask R-CNN >
< Structure of Mask R-CNN >
* Fine-tuned Mask R-CNN
1) 45 kinds of objects
< Examples of target objects: ACRV picking benchmark dataset >
2) Demonstration
< Fine-tuned Mask R-CNN>
2. Object Grasping based on ANN
* Learning grasping pose of robot
* Disadvantages collecting training data in a real working environment
1) A lot of time
2) Robot operation cost
3) Supervision
* Implementing a realistic working environment on the simulator
1) Collecting large amounts of data through simulation
< System configuration: (a) robot in simulation and (b) real robot >
* Learning to grasp based on AlexNet
1) Input: image of the detected object
2) Output: pose of gripper with the highest grasping success rate
< Structure of fine-tuned AlexNet >
* The ensemble learning method combines various classifiers to achieve better performance than a single classifier.
1) Combining the results of learning models
2) Obtaining more reliable results than a learning model
< Concept of ensemble learning >
* 4 classifiers based on four data sets with ensemble learning: trained under
1) Real data
2) Simulation data
3) Another simulation data
4) Real data + simulation data
< Application of ensemble learning to grasp >
< Example of object grasping >
3. Object Grasping based on stochastic machine learning
* Object pose (x, y, q) is necessary to grasp objects.
* Location (x, y) can be known using Mask R-CNN
* Angle (q) is predicted by using the masks of Mask R-CNN and PCA (principal
component analysis)
* Extracting depth images using masks
< Example of extracted depth image using mask >
* Grasping pose estimation based on PCA
1) Predicting the shortest direction (minor axis) of the objects at the center of depth
images
< Result of predicted grasping pose >
* Grasping demonstration
< Example of object grasping >
4. Bin picking demonstration for 3 scenarios
* Target objects: 20 known objects + 5 unknown objects
* Known objects: 20 objects = 7 foods + 7 toys + 6 tools
- Stage 1: Objects in blue boxes
- Stage 2: All objects
- Stage 3: Objects in red boxes + 5 unknown objects
* Scenarios
* Demonstration for each stage
AI-based robotic assembly
- Research Background
Recent robot learning through deep reinforcement learning is learning various robot tasks through DNN without using specific control or recognition algorithms. However, it is difficult to apply this learning method to the contact task of a robot because it can generate excessive force in the random search process of reinforcement learning. Therefore, it is necessary to solve the contact problem using the existing force controller when applying reinforcement learning to contact tasks.
- Research Objective
Trajectory generation based on reinforcement learning algorithm for force-based robotic assembly
- Research Outputs
1) Reinforcement learning method based on DMP and PoWER
2) Reinforcement learning method based on NNMP and DDPG
- Reinforcement learning method based on DMP and PoWER
DMP is used to create complex trajectories that generate the contact force required for assembly through force controller. Then, PoWER is applied to optimize the DMP-based trajectory for the assembly task.
< Control system using DMP & PoWER >
* Dynamic movement primitive (DMP)
1) A motor primitive based on Stefan Schaal's proposed dynamical system
2) DMP can generate complex trajectories using the minimum number of linear
parameters.
3) The shape of the trajectory is determined by the linear parameter. àSuitable for
reinforcement learning
* Policy learning by weighting exploration with the returns (PoWER)
1) Episode based reinforcement learning algorithm applicable to linear deterministic
policy function
2) Reinforcement learning algorithm using expectation maximization (EM)
à No learning rate required.
3) PoWER generally has an excellent learning speed, but the applicable form of the
policy function is limited
* Demonstration
1) Robot: SCORA-V (Safe Collaborative Robot Arm – Vertical type) developed in the
laboratory
2) Control system: PC-based controller with a current control cycle of 1 ms through the
EtherCAT communication
3) Force control algorithm: torque-based impedance controller
4) Assembly parts: square peg-in-hole, size: 50.0 x 50.0 x 30.0 mm, tolerance: 0.1 mm
< Assembly demo before and after learning >
- Reinforcement learning method based on NNMP and DDPG
NNMP is used to create complex trajectories that generate the contact force required for assembly through force controller. Then, DDPG is applied to optimize the trajectory generated by NNMP for the assembly task.
< Control system using NNMP & DDPG >
* Neural network-based movement primitive (NNMP)
1) DNN is used to generate complex trajectories by using various input signals
(measured force and position).
2) The velocity and position are calculated by integrating the acceleration to generate a
continuous trajectory.
3) The size and motion time of the trajectory can be changed by adjusting the
normalization matrix.
4) DAgger based Imitation learning algorithm for proposed NNMP is developed.
< Neural network-based movement primitive >
* Deep deterministic policy gradient (DDPG) for NNMP
1) The measured force and position are added as the state of the NNMP to reflect the
contact state.
2) In order to apply DDPG, the neural network of NNMP is regarded as an actor network.
3) Ornstein–Uhlenbeck (OU) noise is added to the action for exploration
in reinforcement learning.
< Structure of robot system for reinforcement learning with force controller and NNMP >
* Demonstration
1) Robot: SCORA-V (Safe Collaborative Robot Arm – Vertical type) developed in the
laboratory
2) Control system: PC-based controller with a current control cycle of 1 ms through the
EtherCAT communication
3) Force control algorithm: torque-based impedance controller
4) Assembly parts: square peg-in-hole, size: 50.0 x 50.0 x 30.0 mm, tolerance: 0.1 mm
< Assembly demo before and after learning >
Simulator to the real-world transfer of manipulation policy
- Research Background
Reinforcement Learning of a robot’s manipulation policy with deep-learning based approximation requires expensive data. To overcome this, simulators that reflect real-world robot and its surrounding environment have been used to generate Markov transitions for training deep-RL networks.
However, the discrepancy between the simulator and real-world (e.g., friction model, visual rendering, robot dynamics model) deters the direct transfer of the simulator-trained deep-RL networks to real-world robot agent. We aim to solve this issue while preserving sample-efficiency of using the simulator and not deteriorating the trained networks’ generalization error on various manipulation tasks.
- Research Objective
Create an effective simulated environment and training method to transfer deep-RL networks trained in the simulator to real-world robot agent.
- Research Outputs
A demonstration data collection system for aggregating robot’s motion Markov transitions
Asymmetric Actor-Critic based for manipulating real-world robot in POMDP1)
Embracement of recent successful extensions in deep-RL (e.g., PER2), n-step transition3))
* A demonstration data collection system for aggregating robot’s motion Markov transitions
< Transition data collection via velocity kinematics >
We are currently using Gazebo simulator which is fully compatible with Sawyer robotic arm in ROS communication.
However, to our knowledge, recent robot dynamics simulators (e.g., Gazebo, V-REP, MuJoCo) are not equipped with built-in inverse velocity kinematics.
In the viewpoint of reinforcement learning on robotics, defining joint velocity/torque commands is essential to achieving end-to-end deep-RL, which makes it also important to aggregate Markov transitions incorporating velocity/ torque commands as actions. Inspired by this, we incorporated our closed-loop robot control system on the simulator, currently enabling us to exploit velocity command data as actions in Markov transition.
* Asymmetric Actor-Critic based for manipulating real-world robot in POMDP
Inspired by the work of [Pinto, Lerrel, et al., 2017], we aim to implement a learning system that both utilizing the simulator’s fully-observable characteristic and off-policy learning algorithms’ independence in sampling Markov transition data. Therefore, we have modified several recently successful off-policy learning algorithms (e.g., DQN, DDPG, ACER) to fit in our learning environment described above.
It remains a challenge to resolve the Actor-network’s (in DDPG) POMDP characteristic that the visual observation it gets is enough to infer the latent states of robots (e.g., joint positions, velocities, and efforts that Critic-network gets). In addition to this, we tackle effective exploration methods for an actor to explore the environment from the joint velocity commands.