Model Tensor Planning

Sampling-based model predictive control (MPC) offers strong performance in nonlinear and contact-rich robotic tasks, yet often suffers from poor exploration due to locally greedy sampling schemes. We propose Model Tensor Planning (MTP), a novel sampling-based MPC framework that introduces high-entropy control trajectory generation through structured tensor sampling. By sampling over randomized multipartite graphs and interpolating control trajectories with B-splines and Akima splines, MTP ensures smooth and globally diverse control candidates. We further propose a simple β-mixing strategy that blends local exploitative and global exploratory samples within the Cross-Entropy Method (CEM) update, balancing control refinement and exploration. Theoretically, we show that MTP achieves asymptotic path coverage and maximum entropy in the control trajectory space in the limit of infinite tensor depth and width.

Our implementation is fully vectorized using JAX and compatible with MuJoCo XLA, supporting Just-in-time (JIT) compilation and batched rollouts for real-time control with online domain randomization. Through experiments on a variety of challenging robotics tasks—ranging from dexterous in-hand manipulation to humanoid locomotion—we demonstrate that MTP outperforms standard MPC and evolutionary strategy baselines in task success and control robustness. Design and sensitivity ablations confirm the effectiveness of MTP’s tensor sampling structure, spline interpolation choices, and mixing strategy. Altogether, MTP offers a scalable framework for robust exploration in model-based planning and control.

I. Motivating Example

Here, we demonstrate the baselines’ exploration capacity on the Navigation environment, where the point-mass agent is controlled by an axis-aligned 2-dimensional velocity controller. We compare MTP-Akima to evolutionary algorithms and standard MPC baselines, with maximum sampling noise settings. In particular, given the control limits [−1, 1] on the x-y axes, we set the standard deviation σ = 1 for sampling noise of MPPI, and population generation noise for OpenAI-ES and DE. MTP-Akima rollouts reach the green goal very early due to high-entropy tensor sampling, while the baselines, even evolutionary algorithms, struggle to generate a rollout exploring the way out of large local minima.

MTP-Akima

MPPI-MaxStd

OpenAI-ES-MaxStd

DE-NumGen=128

II. Comparison Experiments

We provide task details on the task, cost definitions, and their domain randomization in the tables below. There exist motion capture sensors in MuJoCo to implement the tasks. For this paper, we deliberately design the task costs to be simple and set sufficiently short planning horizons to benchmark the exploratory capacity of algorithms. In practice, one may design dense guiding costs to make the tasks easier.