CoRL 2020 Presentation
Abstract
To achieve high-levels of autonomy, modern robots require the ability to detect and recover from anomalies and failures with minimal human supervision. Multi-modal sensor signals could provide more information for such anomaly detection tasks; however, the fusion of high-dimensional and heterogeneous sensor modalities remains a challenging problem. We propose a deep learning neural network: supervised variational autoencoder (SVAE), for failure identification in unstructured and uncertain environments. Our model leverages the representational power of VAE to extract robust features from high-dimensional inputs for supervised learning tasks. The training objective unifies the generative model and the discriminative model, thus making the learning a one-stage procedure. Our experiments on real field robot data demonstrate superior failure identification performance than baseline methods, and that our model learns interpretable representations.
Motivation
How do we enable field robots to identify failures automatically in highly uncertain field environments?
What is an efficient way of fusing high-dimensional and heterogeneous sensor modalities in robotic applications?
TerraSentia Robot
Row Collision
Untraversable Obstacle
Traversable Obstacle
Supervised Variational Autoencoder (SVAE)
Our proposed model for failure detection task, in our case a multi-class classification task. Left: The high-dimentional inputs are projected onto a latent space to extract features. The classifier makes an inference based on the compressed representation of the high-dimensional data and other low-dimentional data. Right: The VAE is combined with a classifier during training time. The joint trainig of the generative and discriminative model guides SVAEs to learn robust and representative features of high-dimensional inputs, leading to an improved classification performance compared to the baselines.
SVAEs are not specific to anomaly detection tasks, but can be applied to more general classification problems where the input consists of a high-dimensional modality and a low-dimensional modality. SVAEs can also be used in the context of unimodal inputs by removing the low-dimensional inputs from the classifier input layer.
Demo on Real Robot Data
Failure modes explanation:
untvbl: untraversable obstacles
tvbl: traversable obstacles
row_coll: row collision
Experiments: Quantitative Results
Classification results of robot failures on real robot data with different models. The average accuracy is reported over all types of cases that the robot encountered during experiments. All four baselines, along with the SVAE, are evaluated with randomly initialized weights over 10 runs.
Experiments: Qualitative Results
After training, with only two numbers representing the original point cloud of dimension 1080, the decoder can still manage to reconstruct reasonable results.
Furthermore, the SVAE learns interpretable latent space:
z1: how wide the robot's front view is
z2: the orientation of the crop rows