Teng Xue, Amirreza Razmjoo, Suhan Shetty, and Sylvain Calinon
Idiap Research Institute and EPFL
Contact-rich manipulation plays a vital role in daily human activities, yet uncertain physical parameters pose significant challenges for both model-based and model-free planning and control. A promising approach to address this challenge is to develop policies robust to a wide range of parameters. Domain adaptation and domain randomization are commonly used to achieve such policies but often compromise generalization to new instances or perform conservatively due to neglecting instance-specific information. \textit{Explicit motor adaptation} addresses these issues by estimating system parameters online and then retrieving the parameter-conditioned policy from a parameter-augmented base policy. However, it typically relies on precise system identification or additional high-quality policy retraining, presenting substantial challenges for contact-rich tasks with diverse physical parameters. In this work, we propose \textit{implicit motor adaptation}, which leverages tensor factorization as an implicit representation of the base policy. Given a roughly estimated parameter distribution, the parameter-conditioned policy can be efficiently derived by exploiting the separable structure of tensor cores from the base policy. This framework eliminates the need for precise system estimation and policy retraining while preserving optimal behavior and strong generalization. We provide a theoretical analysis validating this method, supported by numerical evaluations on three contact-rich manipulation primitives. Both simulation and real-world experiments demonstrate its ability to generate robust policies for diverse instances.
Determining the exact physical parameters in real-world contact-rich manipulation tasks is typically infeasible, making it difficult to design a controller that is both robust to diverse instances and capable of delivering optimal performance for each case. However, there are two key points we can leverage:
1) Free access to simulator: In the simulator, all parameters are known, enabling us to learn the optimal control policy for each specific instance with its corresponding parameters.
2) Coupling Between Proprioceptive History and System Parameters: In both simulation and the real world, the proprioceptive history (state-action history) of the dynamical system is inherently connected to the physical parameters. This dependency enables online estimation of physical parameters and allows for selecting the corresponding parameter-specific control policy learned in simulation.
Then the problem becomes sim-to-real transfer. If the exact physical parameters of the real world were known, the issue would be resolved. However, it is generally not possible to capture them precisely. In this work, we investigate whether it is possible to retrieve the control policy using only a rough probabilistic estimate of the physical parameters, which is achievable and practical in real-world scenarios.
Our proof demonstrates that this retrieval is feasible but not as a simple direct weighted sum of parameter-specific policies. Instead, it must be implicitly represented as a weighted sum of parameter-specific advantage functions, namely
However, computing the parameter-conditioned advantage function and retrieving the policy can be computationally expensive due to the combinatorial complexity and the need for an argmax operation over an arbitrary function. We address this issue by leveraging the separable structure of the Tensor Train (TT) format for efficient algebraic operations and its capability to find optimal solutions for functions represented in TT format.
TT decomposition generalizes matrix decomposition techniques to higher-dimensional arrays. In TT format, an element in a tensor can be obtained by multiplying specific slices of the core tensors. The figure presents examples of 2nd, 3rd , and 4th-order tensors.
In the stage of parameter-conditioned policy retrieval, the parameter-augmented advantage function in TT format typically includes separate 3rd-order cores for different dimensionality, such as parameter, state and action. Given a probabilistic parameter distribution, we can retrieve the policy by making product of parameter distributions and corresponding TT cores, as shown below:
The whole pipeline includes three modules: parameter-augmented base policy learning, probabilistic system adaptation with proprioceptive history, and parameter-conditioned policy retrieval.
We compare our proposed approach, Implicit Motor Adaptation (IMA), with the widely used Explicit Motor Adaptation (EMA) approach on three contact-rich manipulation tasks: hitting, pushing, and reorientation. All tasks rely heavily on the physical parameters of the object, the robot, and their surroundings.
The above images provide an overview of the domain setup. The images in the corners represent the state, action, and parameter spaces, shown in black, blue, and red, respectively. Each primitive includes a variety of instances defined by parameters, where m represents mass, μ denotes the friction coefficient, r is the radius, and l is the length.
Hit
IMA: football, grass
IMA: bowling, wood
IMA: bowling, grass
IMA: golf, wood
EMA: football, grass
EMA: bowling, wood
EMA: bowling, grass
EMA: golf, wood
Reorientation
IMA: spoon
IMA: pen
IMA: fork
IMA: cylinder
EMA: spoon
EMA: pen
EMA: fork
EMA: cylinder
Planar Push
IMA: sugar box
IMA: mustard bottle
EMA: suggar box
EMA: mustard bottle
Bleach bottle, Metal surface, w/ disturbance
Sugar box, Plywood surface, w/ disturbance