UniFField: A Generalizable Unified Neural Feature Field for
Visual, Semantic, and Spatial Uncertainties in Any Scene
UniFField: A Generalizable Unified Neural Feature Field for
Visual, Semantic, and Spatial Uncertainties in Any Scene
Christian Maurer*, Snehal Jauhri*, Sophie Lueth, and Georgia Chalvatzaki
PEARL Lab
TU Darmstadt, Germany
* Equal contribution
Example object search task using UniFField. The robot explores the scene and incrementally builds the volumetric UniFField representation. UniFField enables uncertainty-aware feature prediction at the visual (top-right), semantic (bottom-left), and spatial (bottom-right) levels, enabling weighted similarity search of an object based on a language query: "find the bottle on the shelf".
Abstract:
Comprehensive visual, geometric, and semantic understanding of a 3D scene is crucial for successful execution of robotic tasks, especially in unstructured and complex environments. Additionally, to make robust decisions, it is necessary for the robot to evaluate the reliability of perceived information. While recent advances in 3D neural feature fields have enabled robots to leverage features from pretrained foundation models for tasks such as language-guided manipulation and navigation, existing methods suffer from two critical limitations: (i) they are typically scene-specific, and (ii) they lack the ability to model uncertainty in their predictions. We present UniFField, a unified uncertainty-aware neural feature field that combines visual, semantic, and geometric features in a single generalizable representation while also predicting uncertainty in each modality. Our approach, which can be applied zero shot to any new environment, incrementally integrates RGB-D images into our voxel-based feature representation as the robot explores the scene, simultaneously updating uncertainty estimation. We evaluate our uncertainty estimations to accurately describe the model prediction errors in scene reconstruction and semantic feature prediction. Furthermore, we successfully leverage our feature predictions and their respective uncertainty for an active object search task using a mobile manipulator robot, demonstrating the capability for robust decision-making.
UniFField
Given a sequence of RGB-D reference frames of a scene, we combine image features, an initial TSDF volume, and uncertainty indicators to construct a unified feature volume. We employ knowledge distillation of a teacher model, novel view synthesis, and geometric reconstruction as pretraining objectives to build the generalizable model. At test time, the model generates visual, spatial, and semantic scene properties, along with their associated uncertainty.
Results on unseen ScanNet scenes
Novel-view synthesis: UniFField successfully recovers a novel scene’s appearance and outperforms a trained NeRF [46] in sparse data conditions (<50 ref. frames)
Semantic similarity search: Through distillation, UniFField can generate CLIP feature maps in unseen scenes that are spatially consistent and can be rendered at a higher resolution than MaskCLIP [49] (per-patch features) for language-based similarity search
3D geometric reconstruction: UniFField aligns well with 3D reconstruction methods and captures finer details than Atlas [44]
Video presentation:
BibTeX:
@misc{maurer2025uniffieldgeneralizableunifiedneural,
title={UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene},
author={Christian Maurer and Snehal Jauhri and Sophie Lueth and Georgia Chalvatzaki},
year={2025},
eprint={2510.06754},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2510.06754},
}