Jayadeep Jacob 1,2 , Wenzheng Zhang 1,, Houston Warren 1,, Paulo Borges3, Tirthankar Bandyopadhyay3 , Fabio Ramos 1,4
1The University of Sydney, 2Data61, CSIRO, 3Orica, 4NVIDIA , USA
(To be presented) 2026 IEEE International Conference on Robotics and Automation (ICRA 2026), Vienna, Austria
Manipulating clusters of deformable objects presents a substantial challenge with widespread applicability, but requires contact-rich whole-arm interactions. A potential solution must address the limited capacity for realistic model synthesis, high uncertainty in perception, and the lack of efficient spatial abstractions, among others. We propose a novel framework for learning model-free policies integrating two modalities: 3D point clouds and proprioceptive touch indicators, emphasising manipulation with full body contact awareness, going beyond traditional end-effector modes. Our reinforcement learning framework leverages a distributional state representation, aided by kernel mean embeddings, to achieve improved training efficiency and real-time inference. Furthermore, we propose a novel context-agnostic occlusion heuristic to clear deformables from a target region for exposure tasks. We deploy the framework in a power line clearance scenario and observe that the agent generates creative strategies leveraging multiple arm links for de-occlusion. Finally, we perform zero-shot sim-to-real policy transfer, allowing the arm to clear real branches with unknown occlusion patterns, unseen topology, and uncertain dynamics.
Illustrative applications of our framework include: Autonomous clearing of overhanging branches from power lines (middle), our primary focus. Exposing fruits hidden by foliage to an external inspection camera (left), our secondary focus. Restoring camera line-of-sight for back panel inspection in data centers (right).
Our simulation environment, featuring a parametric L-system ternary structure mimicking the power line clearance scenario, is shown on the left; followed by sample real-world laboratory settings on the right, where the power line exposure policy is executed on branches of various tree species. On the extreme right is a dedicated simulation setup mimicking the agriculture exposure scenario.
The agent deploys novel strategies to clear the deformable tree branches around the power line using the whole arm.
The agent uses novel strategies to clear deformable branches in an agricultural setup, allowing an external inspection camera to view a partially or fully occluded target (e.g., fruit, diseased regions, or a navigation lookout).
Note: For all videos, use HD on the right bottom of the video and view full screen for clarity. Settings (Gear Icon) >> Quality >> 1080p HD
Note: For all videos, AI generated voice narration is used.
Strategies exhibited by our approach in the real world operating on multiple tree branches of varying species.
Another real-world strategy is shown alongside the segmented point clouds and the real-time change in the occlusion heuristic and contact probabilities.
We borrowed parametric L-system rules, dynamics parameters, & actuation details from [1][2] to generate realistic, self-similar branching structures.
We run thousands of parallel tree models in the NVIDIA Isaac Gym physics simulator, incorporating diverse geometric structures, dynamic behaviors, and contact patterns via domain randomization.
We present a multi-modal, whole-arm contact policy learning framework to manipulate deformable clusters, combining both point clouds (from an RGB-D camera) and touch detection inputs (from proprioceptive sensors).
We present and experimentally validate an efficient distributional state representation of the complex deformable cluster geometry that is efficient for both RL training and inference.
We leverage a novel context-agnostic occlusion heuristic and reward strategy to clear deformables from a target region. Our implementation and primary focus is to clear power lines (validated in both simulation and real) of overhanging tree branches, but generalisable to other applications .
We demonstrate a zero-shot sim-to-real policy transfer that handles unseen branch topology and displays novel clearance patterns in real.
The terminal state poses from both real and simulated trajectories demonstrate whole-arm utilisation in shielding the power line.
We assume the point cloud constitutes samples from a distribution that can be projected into a Reproducing Kernel Hilbert Space (RKHS). This input distribution can be visualized using Kernel Density Estimation (KDE) to observe the transformative effects of Random Fourier Features (RFF). For our experiments, we employ a 2D L-System tree.
In the RFF approximations, the value of R indicates the information content, therefore, as R increases, the density estimation is closer to the exact KDE. Furthermore, we note that the kernel bandwidth is a critical hyperparameter significantly influencing the nature of the estimated distribution. Note that, what we finally pass to the RL as observations is the RFF approximated feature map representing the distribution (contours below) projected to RKHS.
RBF σ = 0.25
RBF σ = 0.1
We borrow the proprioceptive contact classifier to detect robot-branch interaction from [2]. Our classifier evaluation metric is provided on the top.
The features used to train the clasifier & the label is on the right
Classifier Features
Joint torque values over a sliding window of the last 10 time steps
Minimum torque value for each of the six joints
Maximum torque value for each of the six joints
Mean torque for each of the six joints
Variance of torque for each of the six joints
Skewness of torque for each of the six joints
Kurtosis of torque for each of the six joints
Commanded joint velocity values
Executed joint velocity values
Raw torque measurements for each joint
Classifier Label
1 = contact (collision detected)
0 = no contact
A sample prediction given by the classifier [2]. The input in blue is the average time series joint torques of the 6 robot dofs. Red solid lines represent ground truth, i.e., the start & end of the contact. Yellow dashed lines represent the predicted contact at each timestep.
[1]: P. Prusinkiewicz and A. Lindenmayer. The algorithmic beauty of plants. Springer Science & Business Media, 2012.
[2]:Jacob, Jay, et al. "Gentle manipulation of tree branches: A contact-aware policy learning approach." 8th Annual Conference on Robot Learning. 2024.