A real-time navigation pipeline for Gaussian Splatting Maps, a popular emerging 3D scene representation from computer vision.
A real-time navigation pipeline for Gaussian Splatting Maps, a popular emerging 3D scene representation from computer vision.
We present Splat-Nav, a real-time navigation pipeline designed to work with environment representations generated by Gaussian Splatting (GSplat), a popular emerging 3D scene representation from computer vision. Splat-Nav consists of two components: 1) Splat-Plan, a safe planning module, and 2) SplatLoc, a robust pose estimation module. Splat-Plan builds a safe-by-construction polytope corridor through the map based on mathematically rigorous collision constraints and then constructs a Bézier curve trajectory through this corridor. Splat-Loc provides a robust state estimation module, leveraging the point-cloud representation inherent in GSplat scenes for global pose initialization, in the absence of prior knowledge, and recursive real-time pose localization, given only RGB images. The most compute-intensive procedures in our navigation pipeline, such as the computation of the Bézier trajectories and the pose optimization problem run primarily on the CPU, freeing up GPU resources for GPU-intensive tasks, such as online training of Gaussian Splats. We demonstrate the safety and robustness of our pipeline in both simulation and hardware experiments, where we show online re-planning at 5 Hz and pose estimation at about 25 Hz, an order of magnitude faster than Neural Radiance Field (NeRF)-based navigation methods, thereby enabling real-time navigation.
We leverage the ellipsoidal primitives to create polytopes that define the safe regions of space through the use of supporting hyperplanes. The ellipsoidal representation of the environment obtained from GSplat enables the direct computation of these convex obstacle-free regions without the need for a convex optimization procedure. Our proposed method can generate safe polytopes even from an unsafe initial path.
We first create a 3D binary occupancy grid using a point cloud of the Gaussians. To do this, we sample the Fibonacci sphere and apply rotation, scaling, and translation to the points per Gaussian. From this grid, we use A* to generate a candidate path. We take a model predictive control (MPC) perspective, planning a trajectory over a receding time horizon and replanning before the robot reaches the end of this partial trajectory. Using the same set of polytopes (which are quick to compute), our MPC problem uses the next N polytopes and plans a path to the A* waypoint in the N-th polytope in the future. This greatly reduces the number of constraints in the trajectory optimization problem, allowing us to achieve real-time performance.
We formulate a constrained maximum a-posteriori (MAP) optimization problem to compute an estimate of the robot’s pose that satisfies the safety constraints (defined by the polytope) at each timestep. For robots with arbitrary nonlinear dynamics models, solving the MAP optimization problem often proves challenging. To circumvent this challenge, we present a two-phase pose estimation procedure. In the first phase, we utilize an Unscented Kalman Filer (UKF) to compute an estimate of the robot’s pose consisting of a predict procedure, dependent on the robot’s dynamics model, followed by an update procedure, which performs a correction on the estimated pose given the robot’s observations (i.e., camera images). To improve upon the estimates, we can solve a constrained optimization problem to satisfy an approximation of the MAP problem, in an optional second phase.
For online pose estimation, given an estimate of the robot’s pose, we first render a set of 2D images of the GSplat map within a local neighborhood. We then identify keypoints and the associated descriptors for points in these map images, as well as the robot’s RGB image using the encoder-decoder-based method SuperPoint. Next, we match features between both images using the transformer-based method LightGlue. We again utilize RANSAC to remove outliers from the set of matched features to yield the final correspondence set C. Finally, we take the map points and project them into the 3D scene using the perspective projection model, the camera intrinsics, and the rendered depth from GSplat, to estimate the robot's pose from the Perspective-n-Point (PnP) problem.