Human-to-Robot Skill Transfer for Supervisory Control of Dexterous Manipulation

This project was sponsored by Amazon Robotics Greater Boston Tech Initiative.

This project develops and integrates key technologies to enable a remote supervisory control scheme for conducting contact-rich dexterous manipulation tasks (e.g., picking from clutter, packing/unpacking). Our human-robot interface comprises a bundled ultrasound wrist sensor and a soft haptic glove that accurately measures human operator’s finger forces and postures. The data collected via this interface is used in a Learning from Demonstration (LfD) framework for human-to-robot transfer of dexterous manipulation skills. These learned skills are utilized autonomously by the robot until a failure case is detected. When failure happens, the system switches to skill refinement mode, and a remote user utilizes the same ultrasound-haptic glove interface to complete the task, which serves as an additional or corrective demonstration for continuous skill improvement and learning. The system will be implemented for picking from clutter and packing/unpacking tasks and will be supported by a comprehensive human subject study along with several outreach activities for the local Massachusetts community.

Keywords– dexterous manipulation, supervisory control, learning from demonstration, ultrasound sensing, haptics

Overview

We aim to create new generalizable capabilities beyond currently deployed warehouse solutions, so we will use multifingered hands that are capable of reliably grasping a far greater range of items than current suction cup grippers. For skill training, we propose a wearable ultrasound integrated soft haptic glove that can reconstruct the full hand/finger posture and record human finger forces during demonstrations. This data is utilized for human-to-robot skill transfer in the training phase. In the operation phase, we introduce an error detection and skill refinement process that allows a human operator to easily intervene when a failure happens during autonomous execution, and provide additional or corrective demonstrations, helping the autonomous skills improve over time.

Tasks Decomposition

Recent Developments

Learning From Demonstration

The PeARL team at UMass Lowell has taken several steps towards the development of a novel Learning from Demonstration (LfD) approach to force-constrained tasks as well as integration with other modules from the project. To facilitate LfD, a method for segmenting tasks using forces as well as other modalities was developed. This method looks for changepoints in each modality, then probabilistically combines changepoints with weighted importance to find an overall segmentation of the task. Next, a method for reproducing force-constrained tasks was developed. This method is based on Elastic Maps, which model robot trajectories as springs. Minimizing the energies associated with these springs results in an optimized trajectory. Contact forces seen in demonstrations can be modeled as spring forces, and included in the map for optimization. These methods are currently under review for publication. Additionally, we have worked with the other teams in order to better integrate the system. We have coordinated with the HiRo lab at WPI to create a setup to record demonstrations using VR. We also visited WPI several times and implemented our LfD method using their robots with great success. Finally, work has begun on a paper which will detail the integration of all modules.

Ultrasound-based Hand Pose and Force Detection

The effort led by Dr. Zhang’s group focused on developing a user interface to record hand configuration and the force applied by the user, which will be used to train learning by demonstration. Ultrasound is used to capture muscle states, offering advantages over methods like EMG and camera-based systems, which face challenges with data variability, occlusions, shadows, and noise sensitivity. Prior work has shown that forearm ultrasound combined with deep learning can deliver highly accurate measurements and predictions for hand gestures, finger angles, and finger forces. The latest accomplishment is on simultaneously estimating the haptic forces experienced by users while performing five manipulation skills: Push to horizontal, Push to vertical, Slide to edge, Flip, and Simple pick. The experimental setup involved haptic gloves paired with force sensors to provide ground truth validation. Using over 2000 frames of collected data, processed through a previously proposed CNN architecture, the system was able to predict forces across the five manipulation actions with 100% accuracy. The results demonstrated an RMSE of 1.7 PSI with a 7 PSI sensor range. Live experiments showed that predicted pressures consistently matched actual haptic feedback, reinforcing the system's robustness. Other major accomplishments include (1) Developing a novel approach to addressing the challenge of unseen cross-user reproducibility through deep metric learning with triplet network and (2) implementing a miniaturized wireless ultrasound sensor and electronics to maximize wearability through collaboration with the group at ETH Zurich.

Collision Localization and Force Estimation with Tactile Sensing

Dr. Howe’s group, Harvard. This work introduces a novel data-driven approach for estimating the location and force vector for collisions between grasped objects and the environment (A). A machine learning architecture takes signals from contact and joint sensors in the hand and simple characterizations of the object shape and size, which could be readily estimated from vision. The system outputs the estimated contact location and force vector (B). The system is trained on regular geometric objects and tested on unseen examples of these lab objects, as well as everyday real objects (C). This method achieved a 4mm mean localization error and 0.5 N force magnitude error for shapes within the training distribution, and a 16mm location error and 1.2 N force error for out-of-distribution, novel objects (D).

Skill Classification and Application in Cluttered Scenes

This work focuses on the problem of robotic picking in challenging multi-object scenarios. These scenarios include difficult-to-pick objects (e.g., too small, too flat objects) and challenging conditions (e.g., objects obstructed by other objects and/or the environment). To solve these challenges, we leverage four dexterous picking skills inspired by human manipulation techniques and propose methods based on deep neural networks that predict when and how to apply the skills based on the shape of the objects, their relative locations to each other, and the environmental factors. We utilize a compliant, under-actuated hand to reliably apply the identified skills in an open-loop manner. The capabilities of the proposed system are evaluated through a series of real-world experiments, comprising 45 trials with 150+ grasps, to assess its reliability and robustness, particularly in cluttered settings. The videos of all experiments are provided at https://dexterouspicking.wpi.edu/. This research helps bridge the gap between human and robotic grasping, showcasing promising results in various practical scenarios.

Project Team

Students:

Mihir Pradeep Deshmukh, Worcester Polytechnic Institute

Anagha Dangle, Worcester Polytechnic Institute

Nikita Boguslavskii, Worcester Polytechnic Institute

Shilpa Thakur, Worcester Polytechnic Institute

Zeo Liu, Harvard University

Brandan A. Hertel, University of Massachusetts Lowell

PIs:

Berk Calli, Robotics Engineering Department, Worcester Polytechnic Institute

Jane Li, Robotics Engineering Department, Worcester Polytechnic Institute

Cagdas Onal, Robotics Engineering Department, Worcester Polytechnic Institute

Kai Zhang, Robotics Engineering Department, Worcester Polytechnic Institute

Robert Howe, Harvard Paulson School of Engineering and Applied Sciences, Harvard University

Reza Azadeh, Miner School of Computer and Information Sciences, University of Massachusetts Lowell

Page updated

Google Sites

Report abuse