Prolonging Tool Life: Learning Skillful Use of General-purpose Tools through Lifespan-guided Reinforcement Learning

Po-Yen Wu, Cheng-Yu Kuo, Yuki Kadokawa, and Takamitsu Matsubara

Overview

When lifespan is not considered (Before), the learned policy does not account for structural variations across the tool, often applying stress to weaker regions and causing early failure. By integrating lifespan estimation (via FEA) into reinforcement learning, the agent receives a life reward that guides it toward structurally robust regions (After). The resulting lifespan-guided tool use policy balances task reward and lifespan reward to achieve both task success and extended tool lifespan.

Abstract

In inaccessible environments with uncertain task demands, robots often rely on general-purpose tools that lack predefined usage strategies. These tools are not tailored for particular operations, making their longevity highly sensitive to how they are used. This creates a fundamental challenge: how can a robot learn a tool-use policy that both completes the task and prolongs the tool’s lifespan? In this work, we address this challenge by introducing a reinforcement learning (RL) framework that incorporates tool lifespan as a factor during policy optimization. Our framework leverages Finite Element Analysis (FEA) and Miner’s Rule to estimate Remaining Useful Life (RUL) based on accumulated stress, and integrates the RUL into the RL reward to guide policy learning toward lifespan-guided behavior. To handle the fact that RUL can only be estimated after task execution, we introduce an Adaptive Reward Normalization (ARN) mechanism that dynamically adjusts reward scaling based on estimated RULs, ensuring stable learning signals. We validate our method across simulated and real-world tool use tasks, including Object-Moving and Door-Opening with multiple general-purpose tools. The learned policies consistently prolong tool lifespan (up to 8.01× in simulation) and transfer effectively to real-world settings, demonstrating the practical value of learning lifespan-guided tool use strategies

Method

Overview of the proposed method integrating lifespan-guided reward into reinforcement learning. During each rollout, the agent interacts with the environment, collecting state, action, force, and task reward information. At the end of each episode, the stress history is calculated via finite element analysis (FEA) and processed with Miner’s rule to estimate the remaining useful life (RUL). The RUL value is stored in a history buffer, which is used by the adaptive reward normalization (ARN) mechanism to determine dynamic upper and lower bounds. These bounds are applied to normalize the life reward for subsequent episodes, ensuring stable and meaningful reward signals for policy learning.

Experiment

Object-Moving Task

In the Object-Moving task, a UR5e robot with a tool pushes a cylindrical object toward a target location on a planar surface with obstacles.

Door-Opening Task

In the Door-Opening task, the robot uses a tool to press down and rotate a door handle, then pull the door open to 30 degrees.

Video (Simulation)

Object-Moving

Demonstration of learned policies of our proposed method on four different tools, compared to the baseline methods, highlighting both the strategy used by each policy and the corresponding stress variation, as well as the resulting tool RUL after execution.

Tool1

Ours

Baseline

Ours w/o ARN

Torque

Tool2

Ours

Baseline

Ours w/o ARN

Torque

Tool3

Ours

Baseline

Ours w/o ARN

Torque

Tool4

Ours

Baseline

Ours w/o ARN

Torque

Door-Opening

Demonstration of learned policies of our proposed method on two different tools, compared to baseline method.

Tool1

TOOL2

Ours

Baseline

Ours

Baseline

Video (Real Robot)

Demonstration of learned policies of our proposed method trained in simulation, compared to the baseline method in terms of the number of trials until the tool failure.

Object-Moving

Tool1

Ours