TCB-VIO: Tightly-Coupled Focal-plane
Binary-enhanced Visual Inertial Odometry
TCB-VIO: Tightly-Coupled Focal-plane
Binary-enhanced Visual Inertial Odometry
Matthew Lisondra1^, Junseo Kim2^, Glenn Takashi Shimoda1, Kourosh Zareinia3, and Sajad Saeedi4
1University of Toronto, 2Delft University of Technology, 3Toronto Metropolitan University, 4University College London
^ Both authors contributed equally to this research
*** Accepted for Publication in IEEE Robotics and Automation Letters (RA-L) ***
*** Presentation in IEEE ICRA 2026 ***
Paper (on arXiv) / Paper (on IEEE)
*** Presentation Vienna, Austria for IEEE 2026 ICRA ***
Abstract
Vision algorithms can be executed directly on the image sensor when implemented on the next-generation sensors known as focal-plane sensor-processor arrays (FPSP)s, where every pixel has a processor. FPSPs greatly improve latency, reducing the problems associated with the bottleneck of data transfer from a vision sensor to a processor. FPSPs accelerate vision-based algorithms such as visual-inertial odometry (VIO). However, VIO frameworks suffer from spatial drift due to the vision-based pose estimation, whilst temporal drift arises from the inertial measurements. FPSPs circumvent the spatial drift by operating at a high frame rate to match the high-frequency output of the inertial measurements. In this paper, we present TCB-VIO, a tightly coupled 6 degrees-of-freedom (DoF) VIO by a Multi-State Constraint Kalman Filter (MSCKF), operating at a high frame- rate of 250 FPS and from IMU measurements obtained at 400 Hz. TCB-VIO outperforms state-of-the-art VIO methods: ROVIO, VINS-Mono, and ORB-SLAM3.
VIDEO: TCB-VIO for IEEE Robotics and Automation Letters (R-AL).
Agile and mobile robotic systems often operate under strict power and processing constraints. As a result, there is an increasing demand for low-latency and low-power camera technologies. State-of-the-art algorithms utilizing conventional cameras typically achieve frame rates of 40–80 frames per second (FPS). These cameras rely on an architectural design where data is captured, digitized, and transferred to separate digital processing hardware for further computation. This traditional architecture, characterized by the modular separation of sensors and processors, can introduce significant latency and increase power consumption.
On-sensor vision processing presents a novel paradigm that addresses these challenges by co-locating sensors and processors on the same chip. Focal-plane sensor processor arrays (FPSPs) exemplify this approach, with each camera pixel equipped with a small processor capable of processing and sharing data with adjacent pixels (See Fig. 1). This tight integration of sensing and processing reduces power consumption and minimizes delays caused by communication overhead, making FPSPs highly suitable for real-world robotic applications.
Due to the limited die space on FPSPs, per-pixel memory is highly constrained, and processing is typically performed in analog. Analog computation has inherent limitations, including reduced numerical precision, circuit inaccuracies, thermal effects, and noise leakage. These constraints challenge implementing computer vision algorithms and can lead to noisy results. Integrating additional sensing modalities, such as inertial measurement units (IMUs), using visual-inertial odometry (VIO) algorithms can reduce the impact of the noise. However, adapting VIO algorithms to FPSPs requires a redesign of the processing pipeline to preserve the high-speed and low-power advantages of FPSPs.
Early VIO algorithms were filtering-based, using the Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF), and the popular MSCKF, where landmark positions are marginalized from the state vector to reduce the computational complexity. Later, smoothing approaches emerged, optimizing states over a window with tools like GTSAM, leading to variants such as VINS-Mono, SVO 2.0, Kimera VIO, and ORB-SLAM3. Despite improved accuracy, they can suffer from inconsistencies and linearization errors. VIO methods can also be categorized as loosely coupled or tightly coupled. Loosely coupled methods process the two modalities independently and fuse them later. In contrast, tightly coupled algorithms jointly optimize visual and inertial data to achieve higher accuracy and robustness.
For conventional cameras, however, the additional computation required by tightly coupled VIO often reduces update rates and increases the latency between image capture and state update. FPSPs can mitigate this trade-off: their in-pixel parallel processing enables fast on-sensor computations, while their ultra-high frame rate shortens the delay between image captures. For FPSPs, several visual odometry algorithms have been proposed. To date, the only VIO algorithm is BIT-VIO, which is a loosely coupled method; no tightly coupled VIO approaches exist. This gap is addressed in this paper.
This paper presents the first tightly-coupled 6 degrees-of-freedom (DoF) VIO algorithm, designed for FPSPs. The algorithm, coined TCB-VIO, is a Tightly-Coupled VIO utilizing BIT-enhanced features. TCB-VIO processes high-speed, on-sensor binary edge images and feature maps using a novel binary-enhanced Kanade-Lucas-Tomasi (KLT) tracker. It is an extended and modified adaptation of the tightly coupled framework, OpenVINS.
The contributions of this paper are:
(I) First high frame-rate tightly-coupled VIO for FPSPs achieving a high frame rate of 250 FPS.
(II) The framework processes binary edge data and feature coordinates directly on-sensor, reducing computational cost. These outputs are then processed by the novel binary-enhanced KLT tracker at high frame rates.
(III) Real-world evaluations comparing against ROVIO, VINS-Mono, and ORB-SLAM3.
In a focal-plane sensor-processor array (FPSP), each pixel, referred to as a processing element, combines a photosensor circuit (PIX) with on-pixel compute resources, including an arithmetic-logic unit (ALU), input/output (I/O) circuits, local communication links (NEWS), local memory (Registers), and activity control (FLAG). This architecture enables image processing to be performed directly on the sensor.
Overview of the proposed Binary-enhanced KLT Tracking for TCB-VIO. The SCAMP-5 FPSP generates binary corners and edges at 250 FPS. On the host, binary edges are feathered, and the KLT tracker operates in spatial windows centered on each corner feature to achieve robust displacement estimation.
Experimental Setup for SCAMP-5 with two Intel D435i RealSense Camera.
TCB-VIO operates using the SCAMP-5 FPSP, with a processing speed of 250 FPS. IMU measurements (at 400 Hz) and grayscale images for benchmarking are sourced from an Intel D435i RealSense camera. Both sensors are rigidly attached to a single fixture.
The silver spherical markers visible in the photo are used to establish the ground-truth pose within the Vicon motion capture system. The SCAMP-5 FPSP is positioned beneath these markers. The forward-facing Intel camera is used to conduct benchmarking of TCB-VIO against other algorithms.
Parameter optimization for TCB-VIO. Each plot shows the effect of varying one parameter on median ATE, RTE. Dashed vertical lines denote the optimal setting for each parameter.
To determine a robust configuration of TCB-VIO, we conducted a systematic parameter sweep over the core state and feature settings. The analysis measured the median ATE and RTE across multiple high-dynamic trajectories. This exhaustive search revealed consistent trends in how each parameter influenced accuracy, robustness, and computational load.
For sliding window size (N_clones), this parameter controls how many IMU–camera states are kept in the filter. Smaller windows (such as N_clones = 9) reduced memory and runtime but limited temporal constraints, leading to higher drift. Larger windows (such as 13–15 clones) increased inter-frame constraints and improved ATE, though excessively large windows introduced filter inconsistency and additional computation. The optimal trade-off was N_clones = 15, which minimized drift without saturating runtime.
Next, for SLAM features per update (N_SLAM, update), this determines how many persistent landmarks are processed in one update. Lower values (such as 20) yielded lighter updates and faster processing but fewer long-term constraints, degrading robustness. Higher values (such as 30) improved accuracy by providing stronger geometric anchoring, though too many SLAM features risked overloading the update step. We found N_SLAM, update = 30 consistently minimized both ATE and RTE.
Next, MSCKF features per update (N_MSCKF, update) are the transient features which contribute to short-term drift suppression. Fewer features (such as 40) reduced computational cost but weakened constraint information, while higher values (such as 60) improved stabilization, reducing short-term drift at modest computational expense. The results converged at N_MSCKF, update = 60, beyond which no gains were observed.
Lastly, the total tracked features (N_points) has the effect in which increasing the total number of KLT features per frame enriched the constraint pool, improving robustness in dynamic scenes. However, fewer than 600 points led to significant drift and high ATE, while values above 900 offered diminishing returns at increased tracking cost. Performance stabilized around N_points = 800, which we adopt as the default.
In summary, this systematic exploration confirmed clear trends across parameters: larger windows and more features improve accuracy up to a point, after which computation and filter stability dominate. The final configuration reflects the optimal trade-off: N_clones = 15, N_SLAM, update = 30, N_MSCKF, update = 60, and N_points = 800. Collectively, this tuning achieves a favorable balance between estimation accuracy, robustness, and efficiency, ensuring that TCB-VIO sustains high frame-rate operation on FPSP data.
Above lists the key parameters used in the experiments. An exhaustive search over a wide range tuned these. The optimal paramters are as follows:
TABLE I. Key parameters for TCB-VIO configuration.
Overview of representative testing trajectories used in our evaluation, aligned with the performance metrics reported in Table II. All trajectories were executed under fast and hostile motions, as reflected by the high angular velocities (10–31 rad/s) reported in Table II. Ground-truth is shown in gray, with error mapping done for TCB-VIO.
Comparison of TCB-VIO against the baselines ROVIO and VINS-Mono on trajectory #10, as reported in table II. The grey lines are the ground-truth, while the coloured lines are the estimated trajectories. Note that the error scale is about twice as large for ROVIO and more nearly two order of magnitudes larger for VINS-Mono. As indicated in table II, ORB-SLAM3 failed to produce an estimate of the trajectory.
TABLE II. Results for TCB-VIO (as well as two ablations) against the baseline systems (ROVIO, VINS-Mono, ORB-SLAM3) in indoor environments, with ground-truth provided by a Vicon motion capture system. TCB-VIO shows superior ATE in most of the trajectories, and RTE in all of the trajectories, demonstrating its effectiveness in precise frame-to-frame tracking even in hostile motions.
TABLE V. Start-to-end errors for TCB-VIO compared to baselines in an outdoor environment.
VIDEO: TCB-VIO Outdoor Experiments.
VIDEO: TCB-VIO Varying Sub-Location Experiments.
VIDEO: TCB-VIO Room-Scale Experiments.
This research is supported by Natural Sciences and Engineering Research Council of Canada (NSERC).
We would like to thank Piotr Dudek, Stephen J. Carey, and Jianing Chen at the University of Manchester for kindly providing access to SCAMP-5.
If you have any questions, feel free to reach out to us at the following email us at:
lisondra@mie.utoronto.ca (contact at www.mattlisondra.com)
M. Lisondra, J. Kim, G. T. Shimoda, K. Zareinia and S. Saeedi, "TCB-VIO: Tightly-Coupled Focal-Plane Binary-Enhanced Visual Inertial Odometry," IEEE Robotics and Automation Letters, 2025, pp. 1-8, doi: 10.1109/LRA.2025.3619774.
@ARTICLE{11197943,
author={Lisondra, Matthew and Kim, Junseo and Shimoda, Glenn Takashi and Zareinia, Kourosh and Saeedi, Sajad},
journal={IEEE Robotics and Automation Letters},
title={TCB-VIO: Tightly-Coupled Focal-Plane Binary-Enhanced Visual Inertial Odometry},
year={2025},
volume={},
number={},
pages={1-8},
keywords={Image edge detection;Feature extraction;Visualization;Cameras;Sensors;Robot vision systems;Vectors;Registers;Computer architecture;Robustness;Visual-Inertial SLAM;Sensor Fusion},
doi={10.1109/LRA.2025.3619774}}