ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes

Kejun Li, Maegan Tucker, Erdem Bıyık, Ellen Novoseller,

Joel W. Burdick, Yanan Sui, Dorsa Sadigh, Yisong Yue, and Aaron D. Ames

Paper Code

ICRA presentation (12 minutes)

Supplementary Video (3 minutes)

Abstract

Characterizing what types of exoskeleton gaits are comfortable for users, and understanding the science of walking more generally, require recovering a user's utility landscape. Learning these landscapes is challenging, as walking trajectories are defined by numerous gait parameters, data collection from human trials is expensive, and user safety and comfort must be ensured. This work proposes the Region of Interest Active Learning (ROIAL) framework, which actively learns each user's underlying utility function over a region of interest that ensures safety and comfort. ROIAL learns from ordinal and preference feedback, which are more reliable feedback mechanisms than absolute numerical scores. The algorithm's performance is evaluated both in simulation and experimentally for three non-disabled subjects walking inside of a lower-body exoskeleton. ROIAL learns Bayesian posteriors that predict each exoskeleton user's utility landscape across four exoskeleton gait parameters. The algorithm discovers both commonalities and discrepancies across users' gait preferences and identifies the gait parameters that most influenced user feedback. These results demonstrate the feasibility of recovering gait utility landscapes from limited human trials.

Problem Statement

The presented algorithm, termed Region of Interest Active Learning (ROIAL), aims to characterize exoskeleton users' preferences over walking trajectories by learning the users' underlying gait preference landscapes. However, ROIAL does not want to give the user many gaits that make the user feel unsafe or uncomfortable. To this end, ROIAL learns to avoid sampling actions that lie within a "Region of Avoidance" (ROA). This changes the algorithm's goal to only learning the underlying preference landscape across the compliment of the ROA, termed the "Region of Interest" (ROI). This goal can be more formally defined as minimizing the error between the true underlying preference landscape and the learned model posterior, evaluated only over actions within the true ROI. This goal is accomplished by leveraging two forms of online user feedback: pairwise preferences and ordinal labels.

During the first few iterations, ROIAL samples widely across the action space.

ROIAL successfully avoids sampling in the ROA once the upper confidence bound is below the ROA threshold.

Simulation Results

We validate ROIAL's performance on the Hartmann3 (H3) function as well as randomly generated synthetic 3D functions. Through these simulations we evaluated three different aspects of the learning process: the relationship between the number of points included in the posterior update during each iteration (subset size) and the prediction error, the ability of ROIAL to avoid actions in the ROA, and the effect of noisy feedback on the prediction error.

Simulation Results: Effect of Subset Size

In each learning iteration, ROIAL updates the posterior over a random subset of actions. In the simulations, we evaluated the impact of the size of this set of actions, which we call the "subset size". The simulation results are shown in the figure below. Each plot shows the algorithm's error in predicting preferences and ordinal labels (mean +/- standard deviation). Each simulation evaluated the learning performance across 1000 randomly-selected actions.

Hartmann3 function prediction error ( leftmost two figures) and synthetic function prediction error (rightmost two figures)

Simulation Results: Effect of ROA Threshold on Identifying the ROA/ROI

Next, we evaluated in simulation how well ROIAL could identify the ROA/ROI. In each iteration, ROIAL estimates the ROA via an upper confidence bound criterion, governed by a hyperparameter denoted by λ. The results of these simulations showed that estimating the ROA more conservatively (i.e., smaller λ) resulted in fewer samples within the ROA. Also, all of the λ values resulted in similar prediction errors for both preferences and ordinal labels.

Below, the confusion matrices show the learned posterior's prediction accuracy across five ordinal categories.

Number of samples in the ROA and prediction error in the ROI


Confusion matrices for simulations

Simulation Results: Effect of Feedback Noise on Prediction Error

Lastly, the simulations were used to evaluate the effect of noisy feedback on learning. These results, shown in the figure below, demonstrate that noisier feedback caused a slower learning rate.

Prediction error for varying feedback noise thresholds, evaluated on synthetic 3D functions.

Exoskeleton Experimental Setup

The experiments were conducted on the lower-body exoskeleton, Atalante, developed by Wandercraft. Atalante is an 18 degree of freedom robot designed to restore assisted mobility to patients with motor complete paraplegia through the control of 12 actuated joints: 3 at each hip, 1 at each knee, and 2 at each ankle.

The goal of the experiments was to learn the preference landscape for 3 non-disabled subjects across 4 exoskeleton gait parameters: step length (SL), step duration (SD), pelvis roll (PR), and pelvis pitch (PP). The reasoning behind selecting these four parameters is outlined in the next section. The experiments consisted of 40 exoskeleton trials divided into a training phase (30 trials) and a validation phase (10 trials). Subjects were not informed of when the validation phase began. Subjects provided ordinal labels according to four ordinal categories:

  1. Very Bad: User feels unsafe or uncomfortable to the point that the user never wants to repeat the gait.

  2. Bad: User dislikes the gait but does not feel unsafe or uncomfortable.

  3. Neutral: User neither dislikes nor likes the gait and would be willing to try the gait again.

  4. Good: User likes the gait and would be willing to continue walking with it for a long period of time.

Motivation for Exoskeleton Gait Parameter Selection

In the exoskeleton experiments, ROIAL learned over the following four exoskeleton gait parameters: step length (SL), step duration (SD), pelvis roll (PR), and pelvis pitch (PP). These parameters were selected because we believe them to be very influential for user comfort. We obtained this intuition from our previous experiments (paper) that collected exoskeleton user feedback from 6 non-disabled subjects across 6 exoskeleton gait parameters: step length (SL), step duration (SD), step width (SW), step height (SH), pelvis roll (PR), and pelvis pitch (PP). We believe that SL, SD, and PR are highly influential for user comfort because these parameters appeared most in user suggestions as shown in the figure below. Lastly, we selected PP for the ROIAL experiments to further study the relationship between PR and PP, as we expect these to be closely related.

Experimental Results

The figure below illustrates the 4D learned posterior mean utility across the exoskeleton gait parameter space; each column plots the posterior over a different combination of two parameters, while averaging the posterior over the remaining two parameters. For videos of the experimental results, please refer to our supplementary video at the top of the page.