Jay Jacob 1,2 , Shizhe Cai 1,, Paulo Borges2, Tirthankar Bandyopadhyay2 , Fabio Ramos 1,3
1The University of Sydney, 2Data61, CSIRO, 3NVIDIA , USA
8th Annual Conference on Robot Learning (CoRL 2024), Munich, Germany
Learning to interact with deformable tree branches with minimal damage is challenging due to their intricate geometry and inscrutable dynamics. Furthermore, traditional vision-based modeling systems suffer from implicit occlusions in dense foliage, severely changing lighting conditions, and limited field of view, in addition to having a significant computation burden preventing real-time deployment.In this work, we simulate a procedural forest with realistic, self-similar branching structures derived from a parametric L-system model, actuated with crude spring abstractions, mirroring real-world variations with domain randomisation over the morphological and dynamic attributes. We then train a novel Proprioceptive Contact-Aware Policy (PCAP) for a reach task using reinforcement learning, aided by a whole-arm contact detection classifier and reward engineering, without external vision, tactile, or torque sensing. The agent deploys novel strategies to evade and mitigate contact impact, favouring a reactive exploration of the task space. Finally, we demonstrate that the learned behavioural patterns can be transferred zero-shot from simulation to real, allowing the arm to navigate around real branches with unseen topology and variable occlusions while minimising the contact forces and expected ruptures.
Comparison between the baseline policy (PPO with no rewards for avoiding contact) and our contact-aware policy (PCAP) in simulation.
The agent deploys novel and unexpected strategies (i.e, non-obvious side effects of our reward formulation) in both simulation and real to evade contacts as well as to reduce the contact impact forces.
Note: For all videos, use HD on the right bottom of the video and view full screen for clarity. Settings (Gear Icon) >> Quality >> 1080p HD
Comparison between the baseline policy (PPO with no rewards for avoiding contact) and our contact-aware policy (PCAP) in real with actual branches.
Strategies exhibited of our Proprioceptive Contact-Aware policy in the real world operating on multiple tree branches of varying species.
A simple comparison of the two policies, with nominal contact with the help of a deformable ruler in real.
We extend parametric L-system rules[1] from turtle graphics to simulation to generate realistic, self-similar, branching structure to model occlusion patterns found in real-world.
[1]: P. Prusinkiewicz and A. Lindenmayer. The algorithmic beauty of plants. Springer Science & Business Media, 2012.
While a variety of morphological models can be generated with the L-system formalism and our procedural forest generator, we experiment with 4 classes (a, b, c, d) of ternary branching structures. Above are the implementations in Isaac Gym Simulation
During the policy training phase, we randomise the L-system formal parameters to vary the branching structures, the dynamics parameters (stiffness/damping), the reach target, the part of the tree the robot has access to, and the measured contact impact forces.
Left: The confusion matrix & roc for our collision detection classifier, one with a NN model and the other a RF
Top: A sample prediction given by the classifier. The input in blue is the time series joint torques of the six joints. Red solid lines represent the ground truth, i.e., the start and end of the obstruction. Yellow dashed lines represent the predicted contact at each tilmestep.