Robot Vision


Robot vision enables robots to perceive scene structure, depth, objects, and traversable space from visual input, providing essential information for perception, decision-making, and control. Our lab conducts research on robot vision with a focus on spatial perception and simulator construction for embodied and autonomous systems. In particular, SPACE-CLIP is a lightweight monocular depth estimation framework that recovers geometric cues directly from a frozen CLIP vision encoder, without requiring a separate heavy depth-specific backbone. This makes it possible to build a modular spatial perception block that can be integrated more easily into VLA models and robotic control pipelines.

We also study NVSim, a framework that automatically constructs large-scale indoor simulators and navigation graphs from ordinary traversal image sequences without expensive 3D scanning. Through floor-aware Gaussian Splatting and mesh-free traversability checking, NVSim generates cleaner floor representations and more reliable navigable space for real robotic navigation tasks.

Overall, our lab develops robot vision methods that improve depth understanding, spatial reasoning, and environment generation for more capable robotic perception and navigation.