Monocular navigation in cluttered environments using ROSNAV (top) vs. our approach (bottom). ROSNAV constructs cost maps directly from the estimated point cloud (green) generated by DepthAnything, which deviates significantly from the ground-truth (red), leading to incorrect free-space detection (e.g., top row, panel 3) and collisions. In contrast, our method treats the estimated point cloud as a conditioning input to a learned probabilistic collision model, integrated with a risk-aware MPC framework. Snapshots across time steps are shown for both methods (corresponding time indices are labeled).
Abstract
Navigating unknown environments with a single RGB camera is challenging, as the lack of depth information prevents reliable collision-checking. While some methods use estimated depth to build collision maps, we found that depth estimates from vision foundation models are too noisy for zero-shot navigation in cluttered environments. We propose an alternative approach: instead of using noisy estimated depth for direct collision-checking, we use it as a rich context input to a learned collision model. This model predicts the distribution of minimum obstacle clearance that the robot can expect for a given control sequence. At inference, these predictions inform a risk-aware MPC planner that minimizes estimated collision risk. We proposed a joint learning pipeline that co-trains the collision model and risk metric using both safe and unsafe trajectories. Crucially, our joint-training ensures well calibrated uncertainty in our collision model that improves navigation in highly cluttered environments. Consequently, real world experiments show reductions in collision-rate and improvements in goal reaching and speed over several strong baselines.
Motivation
Monocular RGB Navigation is Challenging: Lacks depth information, making reliable collision-checking difficult.
Vision-Based Depth is Too Noisy: Depth estimates from vision foundation models are unreliable for zero-shot navigation in cluttered scenes.
Conventional Collision Maps are Inaccurate: Using noisy depth for costmap-based planning leads to unsafe or overly conservative behavior.
Our Approach
Key Insight: Use estimated depth not for direct collision-checking but as input to a learned collision model.
Predictive Collision Modeling: Model predicts a distribution over minimum obstacle clearance for a given control sequence.
Joint Learning Framework: Collision model and risk metric are trained together using a dataset of safe and unsafe trajectories.
Risk-Aware Planning: At test time, this distribution is used to compute collision risk, which is minimized via model predictive control (MPC).
Contribution
Novel Perspective on Monocular Navigation: Reformulate monocular depth as an imperfect input to a learned stochastic collision model, rather than a proxy for direct collision-checking.
Stochastic Collision Model Design: Predicts worst-case obstacle clearance from RGB input and proposed control sequences, enabling risk-aware trajectory evaluation.
Joint Learning of Collision Model and Risk Metric: Co-train the collision model and collision-risk hyperparameters using both safe and unsafe trajectory data, ensuring task-aligned uncertainty estimation.
Variance Regularization via Downstream Feedback: Leverage downstream task supervision to regulate predicted uncertainty, avoiding both overconfidence and overly conservative behavior.
Risk-Aware MPC Framework: Integrates learned clearance distributions into a model predictive control pipeline to minimize predicted collision risk during planning. We leverage Maximum Mean Discrepancy and kernel learning to improve risk estimation and efficacy of our MPC.
Real-World Validation: Demonstrate significant improvements in navigation safety and success rates over baselines like NoMaD and ROS Navigation Stack on hardware.
BibTex
@ARTICLE{11283048,
author={Sharma, Basant and Jadhav, Prajyot and Paul, Pranjal and Krishna, K.Madhava and Singh, Arun Kumar},
journal={IEEE Robotics and Automation Letters},
title={MonoMPC: Monocular Vision Based Navigation With Learned Collision Model and Risk-Aware Model Predictive Control},
year={2026},
volume={11},
number={2},
pages={1330-1337},
keywords={Navigation;Collision avoidance;Robots;Predictive models;Trajectory;Visualization;Uncertainty;Point cloud compression;Vectors;Pipelines;Vision-based navigation;planning under uncertainty;motion and path planning;collision avoidance},
doi={10.1109/LRA.2025.3641112}}