SACSoN: Scalable Autonomous Control for Social Navigation
Noriaki Hirose1,2, Dhruv Shah1, Ajay Sridhar1, and Sergey Levine1
IEEE Robotics and Automation Letters (RA-Letters)
Live demo at Conference on Robot Learning (CoRL) 2023
Presen. at International Conference on Robotics and Automation (ICRA) 2024
Presen. at 2nd workshop Social Robot Navigation: Advances and Evaluation in IROS 2023
1: University of California, Berkeley, 2: Toyota Motor North America
HuRoN dataset is available here
[RA-Letters][arXiv] [poster] [code]
Abstract
Machine learning provides a powerful tool for building socially compliant robotic systems that go beyond simple predictive models of human behavior. By observing and understanding human interactions from past experiences, learning can enable effective social navigation behaviors directly from data.
In this paper, our goal is to develop methods for training policies for socially unobtrusive navigation, such that robots can navigate among humans in ways that don't disturb human behavior. We introduce a definition for such behavior based on the counterfactual perturbation of the human: if the robot had not intruded into the space, would the human have acted in the same way? By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space. Instantiating this principle requires training policies to minimize their effect on human behavior, and this in turn requires data that allows us to model the behavior of humans in the presence of robots. Therefore, our approach is based on two key contributions. First, we collect a large dataset where an indoor mobile robot interacts with human bystanders. Second, we utilize this dataset to train policies that minimize counterfactual perturbation.
SACSoN policy
To be socially compliant, robots must avoid disrupting the intended behavior of pedestrians within their environment. In training our SACSoN policy, we penalize the counterfactual perturbation from the intended trajectories of the pedestrians(left). We define the intended trajectory of a pedestrian as the predicted trajectory of the pedestrian from our predictive model conditioned on the robot being stationary and non-intrusive. Our method aims to control the robot so that the humans in the environment do not act differently than they would have if the robot had been stationary. This principle could be further generalized to minimize the difference to other counterfactual situations, such as ones where the robot is absent all together, but we focus on the stationary robot counterfactual as a simple instantiation of the principle. For safety, the complete design of our full objective function also includes a term to penalize the predicted distance between the human and the robot, to encourage the robot to maintain clearance(right), as well as the standard navigation terms described in the preceding section.
HuRoN System
To train the SACSoN policy, we need an accurate predictive model of the pedestrians. To train better predictive model, we devise a learning-based system, HuRoN system, that can autonomously collect enriched human-robot interaction data with little-to-no human intervention to train the SACSoN policy, and can improve its data collection policy over time as the ever-growing dataset is reused to further train the collection policy. The core of the HuRoN system is vision-based navigation, with the control policy minimizing a novel interaction loss to encourage interaction with the pedestrians. In addition, we develop a help-and rescue system to remotely teleoperate the robot when it is stuck. We use continual learning to repetitively collect data and fine tune our control policy, which reduces the number of human interventions and scales our data collection system.
We visualize the behaviors of our control policy with and without our interaction loss. The control policy without our interaction loss ignores the pedestrian and tries to go directly to the subgoal position (right side videos). On the other hand, our control policy with the interaction loss deviates from the original path and tries to interact with the pedestrian to enrich the dataset with human-robot interactions (left side videos). In this experiment, we asked the pedestrian not to change the predefined walking path whether the robot will be interacting with the pedestrians or not.
With interaction term (Our interaction term enables us to collect the dataset with enriched human-robot interaction.)
Without interaction term (The control policy without interaction term moves toward goal position without interaction.)
HuRoN Dataset
We collected the HuRoN dataset over the course of 24 days in 5 diverse environments, spread across 3 university buildings by HuRoN system. The dataset spans 75 hours and 58 kilometers of autonomous robot navigation trajectories, containing over 4000 interactions with humans. Our HuRoN dataset is available here.
Evaluation
We design our experiments to evaluate the socially compliant control policy with our proposed objectives, as well as the proposed interaction-enriched dataset collected by our autonomous data collection system. Specifically, we study the following questions:
Does our proposed objective lead to better socially unobtrusive behavior?
Does our proposed data collection system lead to more interactions, and does this in turn lead to better predictive models of pedestrians?
How does the navigation capabilities of our policy improve over the course of collecting our dataset?
Learning socially compliant navigation
Towards answering first question, we train two different policies with and without our proposed objectives . Here, the control policy without our objectives corresponds to the most relevant baseline method, ExAug. In addition, we train different social navigation policy on the naive dataset without the proposed interaction objective. We qualitatively observe the robot's behavior to be significantly more ``compliant'' when trained with our proposed objectives on the interaction-enriched dataset (most left). More results are shown in here.
SACSoN policy trained on our dataset
ExAug policy trained on our dataset
SACSoN policy trained on naive dataset
The Value of Interaction-Rich Data
Modeling Pedestrian Dynamics
While the previous evaluation studies the end-to-end performance of our system, in the next experiment we specifically examine the pedestrian prediction model at the core of our method, and how its predictive accuracy changes based on the composition of the training dataset. Our aim is to understand whether our proposed interaction-seeking data collection scheme actually leads to more accurate pedestrian prediction models.
Continual Learning with the HuRoN System:
Lastly, we evaluate how the navigation capabilities of the robotic policy improve over the course of collecting our dataset. While this experiment does not directly evaluate the robot's ability to interact with humans, it does show how our data collection system can enable autonomous improvement, validating the scalability of our data gathering approach for third question.
Video (Overview)
Acknowledgments
This research was supported by Berkeley DeepDrive at the University of California, Berkeley, and Toyota Motor North America. Additionally, partial support for this research was provided by ARL DCIST CRA W911NF-17-2-0181. The authors would like to express their gratitude to Marwa Abdulhai, Qiyang Li, Manan Tomar, Mitsuhiko Nakamoto, Roxana Infante, Ami Katagiri, Katie Kang, Zheyuan Hu, Oier Mees, Jakub Grudzien Kuba, Pranav Atreya, Isadora White, Zhiyuan Zhou, Anjali Thakrar, Niclas Joswig, Kyle Stachowicz, and Catherine Glossop for their valuable assistance in evaluating the SACSoN.
BibTeX
@article{hirose2023sacson,
author={Hirose, Noriaki and Shah, Dhruv and Sridhar, Ajay and Levine, Sergey},
journal={IEEE Robotics and Automation Letters},
title={SACSoN: Scalable Autonomous Control for Social Navigation},
year={2024},
volume={9},
number={1},
pages={49-56},
doi={10.1109/LRA.2023.3329626}
}