Graph-based Reinforcement Learning meets Mixed Integer Programs: An application to 3D robot assembly discovery

Niklas Funk, Svenja Menzenbach, Georgia Chalvatzaki, Jan Peters

Accepted as Extended Abstract at the First Learning on Graphs Conference (LoG 2022)

Learning on Graphs Conference - Extended Abstract

Full Paper (IROS 2022)

Robot assembly discovery is a challenging problem that lives at the intersection of resource allocation and motion planning. The goal is to combine a predefined set of objects to form something new while considering task execution with the robot-in-the-loop. In this work, we tackle the problem of building arbitrary, predefined target structures entirely from scratch using a set of Tetris-like building blocks and a robot. Our novel hierarchical approach aims at efficiently decomposing the overall task into three feasible levels that benefit mutually from each other. On the high level, we run a classical mixed-integer program for global optimization of blocktype selection and the blocks’ final poses to recreate the desired shape. Its output is then exploited as a prior to efficiently guide the exploration of an underlying reinforcement learning (RL) policy handling decisions regarding structural stability and robotic feasibility. This RL policy draws its generalization properties from a flexible graph-based neural network that is learned through Q-learning and can be refined with search. Lastly, a grasp and motion planner transforms the desired assembly commands into robot joint movements. We demonstrate our proposed method’s performance on a set of competitive simulated robot assembly discovery environments and report performance and robustness gains compared to an unstructured graph-based end-to-end approach.

Additional Videos

I. Video underlining effectiveness of MILP-GNN by comparing against task sequencing using the heuristic

TaskSeqMatters.mp4

II. Demonstrating zero-shot real-world transfer of our proposed MILP-GNN-MCTS approach!

New_visual_all_real_exps_x_16.mp4

Remark: The above video is speedup by a factor of 16