DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training


Aleksei Petrenko, Arthur Allshire, Gavriel State, Ankur Handa, Viktor Makoviychuk

NVIDIA & USC RESL
RSS2023 (Daegu, Republic of Korea)

Links

Abstract

In this work we propose algorithms and methods that enable learning dexterous object manipulation using simulated one- or two-armed robots equipped with multi-fingered hand end-effectors. Using a parallel GPU-accelerated physics simulator (Isaac Gym), we implement challenging tasks for these robots including regrasping, grasp-and-throw, and object reorientation. To solve these problems we introduce a decentralized Population-Based Training (PBT) algorithm that allows us to massively amplify the exploration capabilities of deep reinforcement learning. We find that this method significantly outperforms regular end-to-end learning and is able to discover robust control policies even for the most challenging tasks.

Citation

@article{petrenko2023dexpbt,

  author       = {Aleksei Petrenko and

                  Arthur Allshire and

                  Gavriel State and

                  Ankur Handa and

                  Viktor Makoviychuk},

  title        = {DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training},

  journal      = {CoRR},

  volume       = {abs/2305.12127},

  year         = {2023},

  url          = {https://doi.org/10.48550/arXiv.2305.12127},

  eprinttype    = {arXiv},

  eprint       = {2305.12127},

}


Supplementary Video

Additional Agent Demonstrations

Single-Arm Tasks

Dual-Arm Tasks

Reorientation - Alternative Behavior

This policy, trained with Population-Based Training with 16 agents discovers an alternative strategy. It avoids some in-hand rotations and prefers to put the object back on the table to rotate it there without any risk of dropping the object.