Caregiving is a vital role for domestic robots, especially the repositioning care has immense societal value, critically improving the health and quality of life of individuals with limited mobility.
However, repositioning task is a challenging area of research, as it requires robots to adapt their motions while interacting flexibly with patients.
The task involves several key challenges: (1) applying appropriate force to specific target areas; (2) performing multiple actions seamlessly, each requiring different force application policies; and (3) motion adaptation under uncertain positional conditions.
To address these, we propose a deep neural network (DNN)-based architecture utilizing proprioceptive and visual attention mechanisms, along with impedance control to regulate the robot’s movements.
Using the dual-arm humanoid robot Dry-AIREC, the proposed model successfully generated motions to insert the robot's hand between the bed and a mannequin's back without applying excessive force, and it supported the transition from a supine to a lifted-up position.
Motion generation with different unknown height of bed
Objective and Approach
Supine-to-sitting Repositioning involves two key tasks: reaching (extending the arm to the target point) and lifting (assisting in raising the upper body).
Each task requires distinct force application strategies; during reaching, even a small force applied to non-target areas (e.g., the shoulder or head) can cause unintended shifts in the target’s position, and make the target uncomfortable.
In addition, the motion requires the insertion of the hand to the narrow space between the bed and the target's back, moving the hand along the surface of the bed, that needs flexible motion.
In contrast, lifting requires applying the correct force direction and magnitude to achieve proper repositioning; otherwise, insufficient or incorrect force results in no change in posture.
Additionally, reaching behind the target causes occlusions, complicating motion execution.
Our proposal model integrates visual and proprioceptive attention mechanism, which dynamically adjusts the focus area of vision, joint angles and torques, enabling automatic switching between force application and non-application policies.
Collecting Training data with teleoperation
Collect sequential vision, torque, and joint angle data of the robot by teleoperating it via a motion capture system with IMU sensor.
Proposed deep predictive learning model
The proposed model builds upon EIPL and we integrated proprioceptive attention, Selective Kernel Network (SKNet), allowing dynamic adaptation of joint and torque feature selection.
The left eye camera image, the initial step of the right eye camera image, and torque sensor and joint data are input to the model.
The model, based on long short-term memory (LSTM), learns to predict the next-step left image and joint torque/angle.
Result and discussion
The success rates of the motion generation
The model with SKNet (ours) resulted in a high success rate of completing the motion even in unknown conditions. With our model, the robot successfully extended its hand to the back of the mannequin without excessive contact while providing sufficient support for postural adjustment. In contrast, the baseline model without SKNet (EIPL) consistently failed to complete the motion, as it frequently made unintended contact with the mannequin's head .
Without SKNet, the system was unable to properly distinguish between the target regions, leading to confusion between collision avoidance and the application of supportive force. These results confirm that proprioceptive attention using SKNet is essential for generating effective repositioning motions.
Proprioceptive attention
Sequential flow of proprioceptive attention weight (a, b) of the SKNet. The tensor a corresponds to the latent value associated with the 3X1-and 5X1-kernel convolution. These values reveal whether the model prioritizes fine-grained local features or broader contextual information in different modalities.
SKNet learned to focus on large-region features for joint angles and small-region features for torques. We infer that large-region features were necessary for angles because the larger number of joints were coordinated to perform the repositioning task. Conversely, small-region features were preferred for torque data since only a few joints were mainly involved in supporting the mannequin’s weight.
However, we observed fluctuations after a stronger torque was applied — when the lifting motion began. This suggests that proprioceptive attention contributes to distinguish the timing to start lifting, and realize a seamless transition between task phases and force application policies.
Visual attention
The blue circle points and the red x-shape points indicate current and predicted attention points.
The attention mechanism focused not only on the mannequin's neck region but also on the left handrail of the bed, that enable to adapt accordingly to different bed heights.
Notably, attention was concentrated around Dry-AIREC’s hand, particularly during lifting.
This suggests that the visual attention mechanism plays a crucial role in detecting phase transitions and coordinating vision-proprioceptive integration for smooth and adaptive motion execution.
This work was supported by JST Moonshot R&D, Grant No. JPMJMS2031.
BibTex (IROS2025)
@INPROCEEDINGS{11246394,
author={Miyake, Tamon and Saito, Namiko and Ogata, Tetsuya and Wang, Yushi and Sugano, Shigeki},
booktitle={2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
title={Deep Predictive Learning with Proprioceptive and Visual Attention for Humanoid Robot Repositioning Assistance},
year={2025},
volume={},
number={},
pages={8019-8026},
keywords={Hands;Visualization;Adaptation models;Attention mechanisms;Force;Humanoid robots;Propioception;Neck;Impedance;Robots},
doi={10.1109/IROS60139.2025.11246394}
}