This project focuses on building a robust monocular Structure-from-Motion pipeline capable of reconstructing consistent camera trajectories and sparse 3D geometry in GPS-denied environments. The goal was not just to obtain a point cloud, but to ensure geometric stability and reliable optimization behavior under real-world noise and outliers.
The main challenge was that raw feature tracks extracted from monocular image sequences often led to unstable reconstructions due to poorly conditioned landmarks and noisy correspondences. To address this, I designed a preprocessing stage that filtered weak observations before optimization. The remaining tracks were modeled as a factor graph, enabling joint optimization of camera poses and 3D landmarks through bundle adjustment.
I implemented the optimization using GTSAM and structured the problem to include 1,477 landmarks and 5,306 image projections. By carefully filtering bad observations and tuning the optimization setup, the system achieved a 26% reduction in reprojection error, converging to a final RMSE of 4.03 pixels. The resulting camera trajectories were smooth and consistent across 24 monocular image sequences, producing a stable sparse reconstruction suitable for downstream mapping and localization tasks.