Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics
Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics
Yixuan Huang 1|2, Christopher Agia 1, Jimmy Wu 3, Tucker Hermans 2|4, Jeannette Bohg 1
1 Stanford University, 2 University of Utah, 3 Princeton University, 4 NVIDIA Research
IEEE International Conference on Robotics and Automation (ICRA) 2025
We present Points2Plans, a framework for composable planning with a relational dynamics model that enables robots to solve long-horizon manipulation tasks from partial-view point clouds. Given a language instruction and a point cloud of the scene, our framework initiates a hierarchical planning procedure, whereby a language model generates a high-level plan and a sampling-based planner produces constraint-satisfying continuous parameters for manipulation primitives sequenced according to the high-level plan. Key to our approach is the use of a relational dynamics model as a unifying interface between the continuous and symbolic representations of states and actions, thus facilitating language-driven planning from high-dimensional perceptual input such as point clouds. Whereas previous relational dynamics models require training on datasets of multi-step manipulation scenarios that align with the intended test scenarios, Points2Plans uses only single-step simulated training data while generalizing zero-shot to a variable number of steps during real-world evaluations. We evaluate our approach on tasks involving geometric reasoning, multi-object interactions, and occluded object reasoning in both simulated and real-world settings. Results demonstrate that Points2Plans offers strong generalization to unseen long-horizon tasks in the real world, where it solves over 85% of evaluated tasks while the next best baseline solves only 50%.
➡️ Problem statement: How can one system generalize to a variety of unseen, long-horizon tasks from a partial-view point cloud of the scene, and how can it do so without long-horizon training data?
➡️ Points2Plans: Use a transformer-based relational dynamics model [1] to learn the symbolic and geometric effects of robot skills! Then, generate a long-horizon symbolic (what to do) and geometric (how to do it) plan.
Left: In simulation, we sample an environment state and execute a manipulation primitive at random to generate a dataset of single-step environment transitions and train a relational dynamics model. Middle: At planning time, Points2Plans receives a language instruction and a partial-view, segmented point cloud of the scene and then performs long-horizon planning in a hierarchical fashion with a task planner (e.g., a language model) and the learned relational dynamics model. If planning is successful, Points2Plans returns a sequence of manipulation primitives (what to execute) and their associated continuous parameters (how to execute them) for the given task. Right: Points2Plans executes its plan to solve a variety of unseen long-horizon tasks in the real world.
Task 1: Constrained Packing
Description: The robot is tasked with shelving multiple objects in a spatially constrained environment (e.g., a kitchen cupboard). To succeed, the robot must carefully plan the placement positions of the objects to avoid collisions. All videos are at 4x speed.
🎯 Task difficulty (easy): "Pack three cups in the cupboard."
⚖️ Result: Even with three cups, there remains a lot of space in the cupboard. Thus, all methods successfully plan collision free placements.
Points2Plans (Ours)
Points2Plans-Delta
eRDTransformer [1]
🎯 Task difficulty (hard): "Pack five cups in the cupboard."
⚖️ Result: It is difficult to fit five cups in the small, constrained cupboard. To do so, the planner relies on accurate predictions of future point cloud states to plan collision free placements. Here, Points2Plans succeeds, while the two baselines are simply not accurate enough.
Points2Plans (Ours)
Points2Plans-Delta
eRDTransformer [1]
Task 2: Constrained Retrieval
Description: The robot is tasked with retrieving target objects in a constrained environment. To succeed, the robot must identify and remove objects that occlude the target objects before retrieving them.
🎯 Task: "Serve the ice cream in the blue bowl."
⚖️ Result: Points2Plans succeeds in determining the correct sequence of operations, removing obstacles in front of the blue bowl and ice cream tub. In contrast, the baseline fails by directly reaching for the blue bowl before removing obstacles, causing a collision.
Points2Plans (Ours)
Points2Plans-Feasibility
Task 3: Multi-Object Retrieval
Description: The robot is tasked with retrieving an object inside a container (e.g., a bowl) in a constrained environment. To achieve this task, the robot must first pick and place the container in an open area before grasping the object from inside the container (to avoid collisions). This task tests our planner's ability to reason about multi-object interactions and nested geometric dependencies.
🎯 Task: "I'm hungry! Please get me an apple from the blue bowl." 🎖️ Points2Plans successfully retrieves the apple from the bowl in the shelf.
Points2Plans (Ours)
Task 4: Occluded Object Retrieval
Description: The robot is tasked with retrieving objects in a dark environment, given a history of how the objects became occluded. To succeed, the robot must memorize the positions of the objects and predict their movements while opening the drawer. This task evaluates whether our model can memorize the positions and predict the movements of occluded objects.
🎯 Task: "Please set out grey socks for me in the morning." 🎖️ Points2Plans successfully plans from its memory of object positions and relations (i.e., without perception) to retrieve the occluded, grey pair of socks inside the drawer.
Points2Plans (Ours)
Even if Points2Plans finds a goal-reaching plan, failures can occur when plans are executed in open-loop fashion. For example, imperfections in the underlying manipulation primitives can cause the robot to unstably place (left) or incorrectly grasp (right) an object. 🔍 Therefore, investigating strategies for closed-loop replanning to recover from execution failures is possible point of future extension - among others!
Points2Plans (Ours) - Unstable Placement
Points2Plans (Ours) - Grasp Failure
Citation
If you found this work interesting, please consider citing:
@inproceedings{HuangAgiaEtAl2025,
title = {Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics},
author = {Huang, Yixuan and Agia, Christopher and Wu, Jimmy and Hermans, Tucker and Bohg, Jeannette},
booktitle = {2025 IEEE International Conference on Robotics and Automation (ICRA)},
year = {2025}
}
Related research
[1] eRDTransformer | Huang, Y., Taylor, N. C., Conkey, A., Liu, W., & Hermans, T (2024). IEEE Transactions on Robotics.
[2] Out of Sight, Still in Mind | Huang, Y., Yuan, J., Kim C., Pradhan, P., Chen B., Li F., & Hermans, T (2024). IEEE International Conference on Robotics and Automation (ICRA).
[3] Text2Motion | Lin, K., Agia, C., Migimatsu, T., Pavone, M., & Bohg, J. (2023). Autonomous Robots, 47(8), 1345-1365.
[4] STAP: Sequencing Task-Agnostic Policies | Agia, C., Migimatsu, T., Wu, J., & Bohg, J. (2023). IEEE International Conference on Robotics and Automation (ICRA) (pp. 7951-7958).
Acknowledgements