Inverse Reinforcement Learning Framework for Transferring Task Sequencing Policies from Humans to Robots in Manufacturing Applications
Abstract
In this work, we present an inverse reinforcement learning approach for solving the problem of task sequencing for robots in complex manufacturing processes. Our proposed framework is adaptable to variations in process and can perform sequencing for completely new parts. We prescribe an approach to capture feature interactions in a demonstration dataset based on a metric that computes feature interaction coverage. We then actively learn the expert's policy by keeping the expert in the loop. Our training and testing results reveal that our model can successfully learn the expert's policy. We demonstrate the performance of our method on a real-world manufacturing application where we transfer the policy for task sequencing to a manipulator. Our experiments show that the robot can perform these tasks to produce human-competitive performance.
Video
Real Dataset
Real Training Dataset
We use 6 tools for training by recording expert's sequences on each one of them
Real Testing Dataset
We use the trained model to predict sequences for the tools in testing dataset. We evaluate the model by comparing the model generated sequences and expert's desired sequences for these testing tools
Synthetic Dataset
In the synthetic dataset, we have 10 tools out of which 6 are used for training and 4 are used for testing. We set the weight array w* such that it has a unit norm and all values are equal. We then evaluate the shortest cost sequence for all the tools using this w*. We train a model using the training dataset and use this trained model to perform prediction on the testing dataset
Synthetic Training Dataset
Synthetic Testing Dataset
Feature Information
List of Features
Constrained Internal Edge: Length of the edges that are not on the boundary and are a part of the regions that are already processed
Unconstrained Internal Edge: Length of the edges that are internal and not a part of the region that is already processed
Constrained Boundary Edge: Length of the edges that are on the boundary of the tool and part of a region that is already processed
Unconstrained Boundary Edge: Length of the edges that are on the boundary and not a part of the region that is already processed
Relative Z-Height: Different between the average z-height of processed regions and the z-height of the region that is chosen to be processed. For e.g. in the image, when only B is processed and the next region to be processed is E, the z-height feature is the absolute difference between the z-height of B and E
Curvature: This is the absolute value of curvature of the given region. This value is scaled to be always positive for concave, convex, and flat regions
Convex Hull of processed regions: This feature is the average of the convex hull of the already processed regions
Convex Hull of unprocessed regions: This feature is the average of the convex hull of unprocessed regions
Relative Orientation: This feature is the relative orientation of the surface normal of a specific region with respect to a fixed axis
Region Proximity: This is the distance between the center of the region that is selected to be processed and the center of the closest region that is already processed. Idea is to process regions that are adjacent to already processed regions
NOTE: All features are normalized between 0 and 1