🔸 First Author
Preprint: Arxiv
Publication Date: May 2026
Teleoperation of connected and autonomous vehicles is highly sensitive to communication latency, which can degrade operator situational awareness and control reliability. Existing predictive approaches often focus exclusively on either future sensory prediction or future action estimation, while recent large-scale world models remain computationally expensive for real-time deployment in latency-sensitive systems. This work introduces TeleopWM, a lightweight action-conditioned latent world model for latency-resilient vision-based teleoperation. TeleopWM jointly predicts future visual observations and future driving actions within a unified predictive framework designed for real-time operation under constrained computational budgets. The proposed architecture combines efficient latent video prediction with action-conditioned latent dynamics and a motion-aware future action decoder operating on latent motion transitions rather than static latent appearance representations. By performing predictive rollout estimation entirely in latent space, TeleopWM maintains temporally coherent future visual evolution while simultaneously forecasting future longitudinal and steering behavior. Experimental evaluation in held-out CARLA Town05 driving scenarios demonstrates stable multi-step predictive rollouts, strong future maneuver prediction consistency, and real-time deployment-oriented inference characteristics. TeleopWM achieves Pearson correlations of 0.890 and 0.932 for future longitudinal and steering prediction, respectively, while operating with approximately 38.9 ms inference latency, over 200 predicted future frames per second, and only 1.24 GB of VRAM usage at 512 × 320 resolution. Additional evaluation under motion-filtered urban driving scenarios demonstrates stable predictive behavior under turning maneuvers, acceleration transitions, and intersection dynamics. The presented results suggest that lightweight predictive latent modeling provides a practical direction for jointly supporting predictive display and future action forecasting in latency-sensitive teleoperation systems. Project Page: https://bimilab.github.io/paper-TeleopWM/
Preprint: Arxiv
Publication Date: March 2026
Teleoperation is increasingly being adopted as a critical fallback for autonomous vehicles. However, the impact of network latency on vision-based, perception-driven control remains insufficiently studied. The present work investigates the nonlinear degradation of closed-loop stability in camera-based lane keeping under varying network delays. To conduct this study, we developed the Latency-Aware Vision Teleoperation testbed (LAVT), a research-oriented ROS 2 framework that enables precise, distributed one-way latency measurement and reproducible delay injection. Using LAVT, we performed 180 closed-loop experiments in simulation across diverse road geometries. Our findings reveal a sharp collapse in stability between 150 ms and 225 ms of one-way perception latency, where route completion rates drop from 100% to below 50% as oscillatory instability and phase-lag effects emerge. We further demonstrate that additional control-channel delay compounds these effects, significantly accelerating system failure even under constant visual latency. By combining this systematic empirical characterization with the LAVT testbed, this work provides quantitative insights into perception-driven instability and establishes a reproducible baseline for future latency-compensation and predictive control strategies. Project page, supplementary video, and code are available at https://bimilab.github.io/paper-LAVT/
Published in: MDPI Sensors
Publication Date: Mar 2026
This study introduces the Perception Latency Mitigation Network (PLM-Net), a novel deep learning approach for addressing perception latency in vision-based Autonomous Vehicle (AV) lateral control systems. Perception latency is the delay between capturing the environment through vision sensors (e.g., cameras) and applying an action (e.g., steering). This issue is understudied in both classical and neural-network-based control methods. Reducing this latency with powerful GPUs and FPGAs is possible but impractical for automotive platforms. PLM-Net comprises the Base Model (BM) and the Timed Action Prediction Model (TAPM). BM represents the original Lane Keeping Assist (LKA) system, while TAPM predicts future actions for different latency values. By integrating these models, PLM-Net mitigates perception latency. The final output is determined through linear interpolation of BM and TAPM outputs based on real-time latency. This design addresses both constant and varying latency, improving driving trajectories and steering control. Experimental results validate the efficacy of PLM-Net across various latency conditions. Source code: this https URL.Â
Preprint: Arxiv
Publication Date: March 2025
End-to-end vision-based imitation learning has demonstrated promising results in autonomous driving by learning control commands directly from expert demonstrations. However, traditional approaches rely on either regressionbased models, which provide precise control but lack confidence estimation, or classification-based models, which offer confidence scores but suffer from reduced precision due to discretization. This limitation makes it challenging to quantify the reliability of predicted actions and apply corrections when necessary. In this work, we introduce a dual-head neural network architecture that integrates both regression and classification heads to improve decision reliability in imitation learning. The regression head predicts continuous driving actions, while the classification head estimates confidence, enabling a correction mechanism that adjusts actions in low-confidence scenarios, enhancing driving stability. We evaluate our approach in a closed-loop setting within the CARLA simulator, demonstrating its ability to detect uncertain actions, estimate confidence, and apply real-time corrections. Experimental results show that our method reduces lane deviation and improves trajectory accuracy by up to 50%, outperforming conventional regression-only models. These findings highlight the potential of classification-guided confidence estimation in enhancing the robustness of vision-based imitation learning for autonomous driving. The source code is available at this https URL.
Preprint: Arxiv
Publication Date: Jul 2024
Autonomous Vehicles (AV) and Advanced Driver Assistant Systems (ADAS) prioritize safety over comfort. The intertwining factors of safety and comfort emerge as pivotal elements in ensuring the effectiveness of Autonomous Driving (AD). Users often experience discomfort when AV or ADAS drive the vehicle on their behalf. Providing a personalized human-like AD experience, tailored to match users' unique driving styles while adhering to safety prerequisites, presents a significant opportunity to boost the acceptance of AVs. This paper proposes a novel approach, Neural Driving Style Transfer (NDST), inspired by Neural Style Transfer (NST), to address this issue. NDST integrates a Personalized Block (PB) into the conventional Baseline Driving Model (BDM), allowing for the transfer of a user's unique driving style while adhering to safety parameters. The PB serves as a self-configuring system, learning and adapting to an individual's driving behavior without requiring modifications to the BDM. This approach enables the personalization of AV models, aligning the driving style more closely with user preferences while ensuring baseline safety critical actuation. Two contrasting driving styles (Style A and Style B) were used to validate the proposed NDST methodology, demonstrating its efficacy in transferring personal driving styles to the AV system. Our work highlights the potential of NDST to enhance user comfort in AVs by providing a personalized and familiar driving experience. The findings affirm the feasibility of integrating NDST into existing AV frameworks to bridge the gap between safety and individualized driving styles, promoting wider acceptance and improved user experiences.Â
Published in: IEEE/RSJ International Conference on Intelligent Robots (IROS2023)
Publication Date: Dec 2023 / Presented in Oct 2023
Humans have latency in their visual perception system between observation and action. Any action we take is based on an earlier observation since, by the time we act, the state has already changed, and we have a new observation. In autonomous driving, this latency is also present, determined by the amount of time the control algorithm needs to process information before acting. This algorithmic perception latency can be reduced by massive computing power via GPUs and FPGAs, which is improbable in automobile platforms. Thus, it is a reasonable assumption that the algorithmic perception latency is inevitable. Many researchers have developed different neural network driving models without consideration of the algorithmic perception latency. This paper studies the latency effect on vision-based neural network autonomous driving in the lane-keeping task and proposes a vision-based novel neural network controller, the Adaptive Neural Ensemble Controller (ANEC) that is inspired by the near/far gaze distribution of human drivers during lane-keeping. ANEC was tested in Gazebo 3D simulation environment with Robot Operating System (ROS) which showed the effectiveness of ANEC in dealing with algorithmic latency.
Published in: IBRO Neuroscience Reports
Publication Date: Nov 2023
Most controllers based on Deep Neural Networks (DNNs) are more of a black box model. The outputs of the controllers are assumed to be accurate because the DNNs have been trained to have small prediction errors. However, it is virtually impossible to include all edge cases in the training process so the outputs of DNNs cannot be close to perfection. This raises the question of how much we can trust the output of the controllers. In safety-critical systems such as highly automated mobility, including air and ground vehicles, this question is particularly important. Having a certain level of transparency in how and why the controllers predict actuation signals will significantly improve the reliability of the system. To mitigate the above problems and provide a new learning method, we propose a novel neural network architecture that utilizes the simulation theory (simulation of actions, simulation of perceptions, and anticipations) of cognitive brain function. The simulation theory is largely based on the Sensory Motor Contingency (SMC) theory, which considers perception a form of embodied know-how constituted by lawful regularities in the sensorimotor flow in an active and situated agent. The proposed neural network architecture inspired by forward and inverse models of the cerebellum generates an appropriate sequence of motor actions to achieve a desired state through a pseudo-inverse model. A forward model, trained in the form of the Variational Auto-Encoder (VAE), infers future states caused by the motor actions. The proposed neural network architecture is capable of showing how and why a certain sequence of actions must be applied to a certain task, which means that the decision-making process is transparent as it retains highly adaptive and robust DNN-based methods. The proposed architecture has been tested and validated in a realistic simulated environment with an automated vehicle.Â
Published in: IEEE Access
Publication Date: Feb 2023
Vision-based autonomous driving is rapidly growing. There are, however, presently no agreed-upon metrics for assessing how well deep neural network (DNN) models perform in driving. To compare novel approaches and architectures to existing ones, some researchers employed a mean error between labeled and predicted values in a test dataset and others presented a new metric that is designed to match their requirements. The discrepancy in the usage of various performance metrics and lack of objective metrics to judge the driving performance were our primary motives for developing a feasible solution. In this study, we propose online performance evaluation metrics index (OPEMI), an integrated metric that can evaluate the driving capabilities of autonomous driving models in various driving scenarios. To evaluate driving performance precisely and objectively, OPEMI incorporates several variables, including driving control stability, driving trajectory stability, journey duration, travel distance, success rate, and speed. To demonstrate the validity of OPEMI, we first confirmed that the prediction accuracy has a weak correlation with driving performance. Then, we have discussed the constraints in the existing driving performance metrics in certain circumstances, and their failure to assess the driving models. Finally, we conducted experiments with four popular DNN models and two in-house models under three different driving scenarios (generic, urban, and racing). The results show that the proposed evaluation metric, OPEMI, realistically displays driving performance and demonstrates its validity in various driving scenarios.
Published in: IEEE Access
Publication Date: Mar 2022
The use of high-quality data is required to complete the job of lateral control utilizing Behavioral Cloning (BC) through an End-to-End (E2E) learning system. The majority of E2E learning systems gather this high-quality data all at once before beginning the training phase (i.e., the training process does not start until the end of the data collection process). The demand for high-quality data necessitates a large amount of human effort and substantial time and money spent waiting for data collection to be completed. As a result, it is critical to find a viable option to reduce both the time and cost of data collecting while also maintaining the performance of a trained vehicle controller. This paper offers a novel behavioral cloning approach for lateral vehicle control to address the aforementioned problems. The proposed technique begins by collecting the least amount of human driving data possible. The data from human drivers are utilized for training a convolutional neural network for lateral control. The trained neural network is subsequently deployed to the vehicle’s automated driving controller, replacing a human driver. At this point, a human driver is out of the loop, and an automated driving controller, trained by the initial data from a human driver, drives the vehicle to collect further training data. The driving data obtained are sent into a convolutional neural network training module, then the newly trained neural network is deployed to the automated driving controller that will drive the vehicle further. The data collection alternates neural network training processes using the collected data until the neural network learns to correctly associate an image input with a steering angle. The proposed incremental approach was extensively tested in simulated environments, and the results are promising, only 3.81% (1,061 out of 27,884) of the total data came from a human driver. The incrementally trained neural networks using data collected by automated controllers were able to drive the vehicle in two different tracks successfully. The AI chauffeur was able to drive the vehicle on Track B for more than 70% of the track even though it had not seen the track before.Â
Published in: MDPI Energies
Publication Date: Dec 2021
For autonomous driving research, using a scaled vehicle platform is a viable alternative compared to a full-scale vehicle. However, using embedded solutions such as small robotic platforms with differential driving or radio-controlled (RC) car-based platforms can be limiting on, for example, sensor package restrictions or computing challenges. Furthermore, for a given controller, specialized expertise and abilities are necessary. To address such problems, this paper proposes a feasible solution, the Ridon vehicle, which is a spacious ride-on automobile with high-driving electric power and a custom-designed drive-by-wire system powered by a full-scale machine-learning-ready computer. The major objective of this paper is to provide a thorough and appropriate method for constructing a cost-effective platform with a drive-by-wire system and sensor packages so that machine-learning-based algorithms can be tested and deployed on a scaled vehicle. The proposed platform employs a modular and hierarchical software architecture, with microcontroller programs handling the low-level motor controls and a graphics processing unit (GPU)-powered laptop computer processing the higher and more sophisticated algorithms. The Ridon vehicle platform is validated by employing it in a deep-learning-based behavioral cloning study. The suggested platform’s affordability and adaptability would benefit broader research and the education community.
Published in: MDPI Energies
Publication Date: Sep 2021
We demonstrate a working functional prototype of a cooperative perception system that maintains a real-time digital twin of the traffic environment, providing a more accurate and more reliable model than any of the participant subsystems—in this case, smart vehicles and infrastructure stations—would manage individually. The importance of such technology is that it can facilitate a spectrum of new derivative services, including cloud-assisted and cloud-controlled ADAS functions, dynamic map generation with analytics for traffic control and road infrastructure monitoring, a digital framework for operating vehicle testing grounds, logistics facilities, etc. In this paper, we constrain our discussion on the viability of the core concept and implement a system that provides a single service: the live visualization of our digital twin in a 3D simulation, which instantly and reliably matches the state of the real-world environment and showcases the advantages of real-time fusion of sensory data from various traffic participants. We envision this prototype system as part of a larger network of local information processing and integration nodes, i.e., the logically centralized digital twin is maintained in a physically distributed edge cloud.