The structure from motion problem (SfM) consists in determining the three-dimensional structure of the scene by using the measurements provided by one or more
sensors over time (e.g. vision sensors, ego-motion sensors, range sensors). Solving this problem consists in simultaneously performing self motion perception (sMP) and depth perception (DP). In the case of visual measurements only, the SfM has been solved up to a scale in closed form (e.g., by the eight point algorithm or the five point algorithm published here). The case of inertial and visual measurements has particular interest and has been investigated by many disciplines, both in the framework of computer science and in the framework of neuroscience (the visual-vestibular integration for sMP and for DP).
By applying basic results on nonlinear observability, both introduced during the seventies by the automatic control community and also new concepts developed by myself (e.g., the concept of continuous symmetry introduced here), I obtained all the observability properties of this sensor fusion problem. The results are published here. Basically, I proved that, in addition to what it can be determined with a single camera (i.e., the structure and the motion up to a scale), it is also possible to determine the absolute scale and the absolute roll and pitch angles (these last two observable modes are a consequence of the presence of the gravity). In addition, in contrast to the case of a single camera where the solution needs to have at least five point features, it suffices one single point feature. Finally, the determination is possible even in presence of biased measurements, unknown camera extrinsic calibration and unknown magnitude of gravity.
More importantly, I found the analytic expression of the observable modes in terms of the visual and inertial measurements delivered during a short time interval. In other words, I solved the problem in closed-form. This was obtained by first proving that the visual inertial sensor fusion problem is equivalent to to a simple polynomial equations system (PES) and then by finding its analytic solutions. In addition, starting from the aforementioned equivalence, I could analytically investigate all the problem singularities. In particular, I analytically derived all the singularities in terms of the feature points layout, the accomplished motion, the number of features, the number of camera images. All these results are published here.