Abstract

Navigating unknown environments with a single RGB camera is challenging, as the lack of depth information prevents reliable collision-checking. While some methods use estimated depth to build collision maps, we found that depth estimates from vision foundation models are too noisy for zero-shot navigation in cluttered environments.

We propose an alternative approach: instead of using noisy estimated depth for direct collision-checking, we use it as a rich context input to a learned collision model. This model predicts the distribution of minimum obstacle clearance that the robot can expect for a given control sequence. At inference, these predictions inform a risk-aware MPC planner that minimizes estimated collision risk. We proposed a joint learning pipeline that co-trains the collision model and risk metric using both safe and unsafe trajectories. Crucially, our joint-training ensures well calibrated uncertainty in our collision model that improves navigation in highly cluttered environments. Consequently, real-world experiments show reductions in collision-rate and improvements in goal reaching and speed over several strong baselines.

website_monompc

Cite this work

@misc{sharma2025monompc,

title = {MonoMPC: Monocular Vision-Based Navigation with Learned Collision Model and Risk-Aware Model Predictive Control},

author = {Sharma, Basant and Jadhav, Prajyot and Paul, Pranjal and Krishna, K. Madhava and Singh, Arun Kumar},

year = {2025},

eprint = {2508.07387},

archivePrefix = {arXiv},

primaryClass = {cs.RO},

url = {https://arxiv.org/abs/2508.07387}

}

Page updated

Google Sites

Report abuse