Ctrl-World:
A Controllable Generative World Model for Robot Manipulation
Ctrl-World:
A Controllable Generative World Model for Robot Manipulation
Ctrl-World is designed for policy-in-the-loop rollouts with generalist robot policies. It generates joint multi-view predictions (including wrist views), enforces fine-grained action control via frame-level conditioning, and sustains coherent long-horizon dynamics through pose-conditioned memory retrieval.
These components enable (1) accurate policy evaluation in imagination, with alignment to real-world rollouts, and (2) targeted policy improvement through synthetic trajectories.
Figure 3: Comparisons with baseline
Ground Truth
Tasks: (1) Push towel. (2)Replace bowl
Ctrl-World Multi-view Joint Prediction (Ours)
Aligned with ground truth
Baseline failed to move object
Figure 4: Controllability of Ctrl-World
Initial frame
Ablations
Different actions lead to different future
Ablation: remove memory
Different actions lead to different future
Ablation: remove frame-level pose condition
Figure 6: Comparisons on rollouts in real-world and Crtl-Wrold (Pi05, same initial obs)
Real World Execution
World Model
Rollout
Pick blue block and place on white plate
Fold the towel into half
Real World Execution
World Model
Rollout
Place sponge in drawer.
Close the laptop
Real World Execution
World Model
Rollout
Wipe table with towel from left to right.
Pull one tissue out of the box.
Real World Execution
World Model
Rollout
Stack blue block on red block.
Figure 8: Synthetic data used for finetuning the policy
Novel Object
Folding towel from desired direction.
Spatial understanding. (e.g. , left, right, top right, bottom side)
Shape understanding.
(e.g., smaller/larger block)
Figure 9: Qualitative results of Policy Improvement
Language instruction:
Pick the object in top left side and place in box.
2. Pick glove and place in box
3. Grasp the bigger red block and place in box.
4. Fold the towel from left to right.
Base Policy: wrong instruction following :(
Post-training on synthetic data: success :)