The key contribution of this work is to design of a device that has a system architecture modified to detect trapped victims focusing on the real – world scenario, where people who are trapped are either fully conscious, totally unconscious and when only their body parts are visible. We propose to develop the drone system that is robust to these problems by utilizing dual optical sensors and gas sensors.
In this work we integrated the deep learning based object detection model and VR model of car cockpit. This is a part of a project where our main objective is to explore and analysis interaction between autonomous vehicle and human in virtual reality.
With semi-automated vehicles, drivers are still required to be ready to intervene upon a takeover request (TOR) and face the difficulty of achieving their optimal performance level directly after a passive phase. In this work, we examine the effects of using Extended Reality (XR) interface to assist drivers for taking over and in the first seconds of controlling the vehicle. We focus on developing a lane detection algorithm to keep the vehicles follow the ego lanes. We present a prototype of an Augmented Reality (AR) and Mixed Reality (MR) assistance system realized on a simulated environment. In a user study, we compare mixed reality with augmented reality display and present results on response time to take over request and found AR is significantly better than MR interfaces. [Paper link: 10.1145/3544999.3552527]
With the increasing prevalence of video recordings there is a growing need for tools that can maintain the privacy of those recorded. In this paper, we define an approach for redacting personally identifiable text from videos using a combination of optical character recognition (OCR) and natural language processing (NLP) techniques. We examine the relative performance of this approach when used with different OCR models, specifically Tesseract and the OCR system from Google Cloud Vision (GCV). For the proposed approach the performance of GCV, in both accuracy and speed, is significantly higher than Tesseract. Finally, we explore the advantages and disadvantages of both models in real-world applications. [Paper link: 10.1145/3544999.3552529]
In the past years, energy consumption has increased rapidly due to many factors, including the rise in technology adoption. This has many downfalls, from higher costs to CO2 emissions. Human activities in offices and houses represent a considerable amount of energy usage. We made a Digital Twin (DT) of an open-plan common space that can be used for remote room occupancy monitoring and automatic energy consumption detection. We proposed a linear regression-based mapping technique to mimic the office users’ movement. We also proposed an intelligent machine vision model to detect electrical appliances’ ON/OFF status in the physical space. Our implementation reported a significant correlation in the human mapping (R2 = 0.85). Our energy consumption algorithm accuracy obtained true positive rate of 91.58% and F1 score of 81.96%. Finally, all this information is transmitted and visualized to the 3D digital twin for remote monitoring and simulation. [Paper link: 10.1007/s44257-024-00008-z]
In this work, we interact multiple robots through different modalities of interaction. The objective of this work to pick and drop different objects in pick-up autonomous car in warehouses. Here robotic arms follow holographic line to pick the object and drop in the pick-up car. Another robot pick the object using eye gaze followed by dropping in the autonomous pick-up car. Then one of the pick-up car follow lane navigation algorithm to reach the destination. To return back to it's source point, this car can be turned back manually and again following same lane it returns back to its source point in autonomous mode. While another pick-up car follow point coordinates to reach destination. This pick-up car also have in-built collision avoidance algorithm to avoid any collision with other pick-up car. Finally both car reach to destination successfully without facing any collision. [Paper link: 10.1007/s12193-023-00421-w]
This work is a part of a long work, where we are working on developing an end-to-end system for autonomous taxiing of aircraft. We have fused a novel lane detection model and autonomous lane following algorithm. Along with that we have created a virtual airport scenario to give a better visualization.
This work presents a vision-guided, novel, and robust system for the autonomous taxiing of an aircraft in the real world. The system is an ensemble of autonomous navigation and collision avoidance modules. The navigation module detects the lane and sends the control signal to the steercontrol algorithm. This algorithm uses a controller to help the aircraft follow the central line with a resolution of 0.013 cm. The object detection module in the collision avoidance algorithm was compared with state-of-the-art models on the road object dataset and proved its superiority. In parallel, an airport dataset is proposed, and the object detection model is fine-tuned with it to avoid collision with any ground vehicle. A detailed study is conducted in different lighting conditions to prove the efficacy of the proposed system. It is observed that lane detection and collision avoidance module work with a true positive rate of 92.59% and 85.19%, respectively. [Paper link: 10.3846/aviation.2023.20588]
A digital twin of working space has a profound and transformative impact. It aids in space planning, visualization, and simulation of physical office environments. Additionally, it plays a crucial role in promoting energy efficiency and sustainability within the workspace. However, several challenges are associated with effectively placing assets when reconstructing a real-world space in virtual reality. This work presents a novel machine learning-based development of digital twins, an alternative to the conventional reconstruction process. It uses 2D frames from the real world to pass through the object detection model to detect objects of interest with an accuracy of mAP of 0.74. Then an image processing algorithm is presented to calculate the orientation of the movable objects in space. Finally, this information is passed through a novel neural network structure that maps 2D coordinates to 3D locations in virtual space. A detailed comparison study is conducted between multiple machine learning models to decide neural network as the mapping algorithm. [Paper link: https://ceur-ws.org/Vol-3660/paper27.pdf]
Distraction detection systems in automotive has great importance due to the prime safety of passengers. Earlier approaches confined to use indirect methods of driving performance metrics to detect visual distraction. Recent methods attempted to develop dedicated classification models for gaze zone estimation whose cross-domain performance was not investigated. We adopt a more generic appearance-based gaze estimation approach where no assumption on setting or participant was made. We proposed MAGE-Net with less number of parameters while achieving on par performance with state of the art techniques on MPIIGaze dataset. We utilized the proposed MAGE-Net and performed a cross-domain evaluation in automotive setting with 10 participants. We observed that the gaze region error using MAGE-Net for interior regions of car is 15.61 cm and 15.13 cm in x and y directions respectively. We utilized these results and demonstrated the capability of proposed system to detect visual distraction using a driving simulator. [Paper link: 10.1145/3490100.3516463; 10.1145/3490100.3516467 ]
[Dataset link: https://github.com/lrdmurthy/PARKS-Gaze]
In automotive, usage of electronic devices increased visual inattention of drivers while driving and might lead to accidents. It is often challenging to detect if a driver experienced a change in cognitive state requiring new technology that can best estimate driver's cognitive load. In this paper, we investigated the efficacy of various ocular parameters to estimate cognitive load and detect cognitive state of driver. We derived gaze and pupil-based metrics and evaluated their efficacy in classifying different levels of cognitive states while performing psychometric tests in varying light conditions. We validated the performance of our metrics in simulation as well as in-car environments. We compared the accuracy (from confusion matrix) of detecting cognitive state while performing secondary task using our proposed metrics and Machine Learning models. It was found that a Neural Network model combining multiple ocular metrics showed better accuracy (75%) than individual ocular metrics. Finally, we demonstrated the potential of our system to alert drivers in real-time under critical distractions. [Paper link: 10.1016/j.treng.2020.100008 ]
This work presents a detailed comparison study between Generative Adversarial Networks (GANs) models to assess the efficacy in generating realistic datasets. Focusing on image-to-image translation, the study explores the utilization of synthetic data in enhancing deep learning model performance. Acknowledging difficulty associated with real image datasets preparation, the research emphasizes the importance of transforming synthetic data into realistic representations. After conducting a thorough comparison between three GAN models, CycleGAN is found to be best model among others (Fr´echet Inception Distance (FID) score of 0.001, 50.574 and 63.000 respectively for three distinct datasets), shows its ability to produce realistic images across three distinct case studies. The study demonstrates the significance of synthetic data in mitigating data scarcity and privacy constraints in object detection tasks. By integrating GAN-based image translation, such as in YOLOv8 training, improvements of 56.8% in mAP@50 and 14.4% in F1 score accuracy is observed in three different cases. [Paper link: 10.1109/ICVTTS62812.2024.10763945]
Mixed reality applications in industrial contexts necessitate extensive and varied datasets for training object detection models, yet actual data gathering may be obstructed by logistical or cost issues. This study investigates the implementation of generative AI methods to work on this issue for mixed reality applications, with an emphasis on assembly and disassembly tasks. The novel objects found in industrial settings are difficult to describe using words, making text-based models less effective. In this study, a diffusion model is used to generate images by combining novel objects with various backgrounds. The backgrounds are selected where object detection in specific applications has been ineffective. This approach efficiently produces a diverse range of training samples. We compare three approaches: traditional augmentation methods, GAN-based augmentation, and Diffusion-based augmentation. Results show that the diffusion model significantly improved detection metrics. For instance, applying diffusion models to the dataset containing mechanical components of a pneumatic cylinder raised the 𝐹1 Score from 69.77 to 84.21 and the 𝑚𝐴𝑃@50 from 76.48 to 88.77, resulting in an 11% increase in object detection performance, with a 67% less dataset size compared to the traditional augmented dataset. The proposed image composition diffusion model and user-friendly interface further simplify dataset enrichment, proving effective for augmenting data and improving the robustness of detection models. [Paper link: https://doi.org/10.1145/3708359.3712163]
Assembling complex components often requires expert guidance. Augmented Reality (AR) offers intuitive visual assistance that can enhance user performance, yet existing AR-based systems largely focus on simple tasks, limiting their application to intricate assembly scenarios. This paper addresses the gap by developing a comprehensive pipeline for AI and AR-based assembly guidance. We analyzed various software architectures, selecting the optimal setup for advanced manufacturing based on latency, accuracy, and human factors. A novel AI-based hologram registration technique was implemented to provide real-time assistance in dynamic environments. Additionally, a multimodal user interface was designed to facilitate seamless interaction with the Mixed Reality (MR) system, allowing users to access instructions efficiently. Results demonstrate that users completed tasks faster with the MR system compared to traditional video-based methods. The proposed system also significantly reduces cognitive load and improves usability, positioning it as an effective tool for modern manufacturing. [Paper link: https://doi.org/10.1007/s44430-025-00005-1]