The key contribution of this work is to design of a device that has a system architecture modified to detect trapped victims focusing on the real – world scenario, where people who are trapped are either fully conscious, totally unconscious and when only their body parts are visible. We propose to develop the drone system that is robust to these problems by utilizing dual optical sensors and gas sensors.
In this work we integrated the deep learning based object detection model and VR model of car cockpit. This is a part of a project where our main objective is to explore and analysis interaction between autonomous vehicle and human in virtual reality.
With semi-automated vehicles, drivers are still required to be ready to intervene upon a takeover request (TOR) and face the difficulty of achieving their optimal performance level directly after a passive phase. In this work, we examine the effects of using Extended Reality (XR) interface to assist drivers for taking over and in the first seconds of controlling the vehicle. We focus on developing a lane detection algorithm to keep the vehicles follow the ego lanes. We present a prototype of an Augmented Reality (AR) and Mixed Reality (MR) assistance system realized on a simulated environment. In a user study, we compare mixed reality with augmented reality display and present results on response time to take over request and found AR is significantly better than MR interfaces. [Paper link: 10.1145/3544999.3552527]
With the increasing prevalence of video recordings there is a growing need for tools that can maintain the privacy of those recorded. In this paper, we define an approach for redacting personally identifiable text from videos using a combination of optical character recognition (OCR) and natural language processing (NLP) techniques. We examine the relative performance of this approach when used with different OCR models, specifically Tesseract and the OCR system from Google Cloud Vision (GCV). For the proposed approach the performance of GCV, in both accuracy and speed, is significantly higher than Tesseract. Finally, we explore the advantages and disadvantages of both models in real-world applications. [Paper link: 10.1145/3544999.3552529]
In the past years, energy consumption has increased rapidly due to many factors, including the rise in technology adoption. This has many downfalls, from higher costs to CO2 emissions. Human activities in offices and houses represent a considerable amount of energy usage. We made a Digital Twin (DT) of an open-plan common space that can be used for remote room occupancy monitoring and automatic energy consumption detection. We proposed a linear regression-based mapping technique to mimic the office users’ movement. We also proposed an intelligent machine vision model to detect electrical appliances’ ON/OFF status in the physical space. Our implementation reported a significant correlation in the human mapping (R2 = 0.85). Our energy consumption algorithm accuracy obtained true positive rate of 91.58% and F1 score of 81.96%. Finally, all this information is transmitted and visualized to the 3D digital twin for remote monitoring and simulation. [Paper link: 10.1007/s44257-024-00008-z]
In this work, we interact multiple robots through different modalities of interaction. The objective of this work to pick and drop different objects in pick-up autonomous car in warehouses. Here robotic arms follow holographic line to pick the object and drop in the pick-up car. Another robot pick the object using eye gaze followed by dropping in the autonomous pick-up car. Then one of the pick-up car follow lane navigation algorithm to reach the destination. To return back to it's source point, this car can be turned back manually and again following same lane it returns back to its source point in autonomous mode. While another pick-up car follow point coordinates to reach destination. This pick-up car also have in-built collision avoidance algorithm to avoid any collision with other pick-up car. Finally both car reach to destination successfully without facing any collision. [Paper link: 10.1007/s12193-023-00421-w]
This work is a part of a long work, where we are working on developing an end-to-end system for autonomous taxiing of aircraft. We have fused a novel lane detection model and autonomous lane following algorithm. Along with that we have created a virtual airport scenario to give a better visualization.
This work presents a vision-guided, novel, and robust system for the autonomous taxiing of an aircraft in the real world. The system is an ensemble of autonomous navigation and collision avoidance modules. The navigation module detects the lane and sends the control signal to the steercontrol algorithm. This algorithm uses a controller to help the aircraft follow the central line with a resolution of 0.013 cm. The object detection module in the collision avoidance algorithm was compared with state-of-the-art models on the road object dataset and proved its superiority. In parallel, an airport dataset is proposed, and the object detection model is fine-tuned with it to avoid collision with any ground vehicle. A detailed study is conducted in different lighting conditions to prove the efficacy of the proposed system. It is observed that lane detection and collision avoidance module work with a true positive rate of 92.59% and 85.19%, respectively. [Paper link: 10.3846/aviation.2023.20588]
A digital twin of working space has a profound and transformative impact. It aids in space planning, visualization, and simulation of physical office environments. Additionally, it plays a crucial role in promoting energy efficiency and sustainability within the workspace. However, several challenges are associated with effectively placing assets when reconstructing a real-world space in virtual reality. This work presents a novel machine learning-based development of digital twins, an alternative to the conventional reconstruction process. It uses 2D frames from the real world to pass through the object detection model to detect objects of interest with an accuracy of mAP of 0.74. Then an image processing algorithm is presented to calculate the orientation of the movable objects in space. Finally, this information is passed through a novel neural network structure that maps 2D coordinates to 3D locations in virtual space. A detailed comparison study is conducted between multiple machine learning models to decide neural network as the mapping algorithm. [Paper link: https://ceur-ws.org/Vol-3660/paper27.pdf]
Distraction detection systems in automotive has great importance due to the prime safety of passengers. Earlier approaches confined to use indirect methods of driving performance metrics to detect visual distraction. Recent methods attempted to develop dedicated classification models for gaze zone estimation whose cross-domain performance was not investigated. We adopt a more generic appearance-based gaze estimation approach where no assumption on setting or participant was made. We proposed MAGE-Net with less number of parameters while achieving on par performance with state of the art techniques on MPIIGaze dataset. We utilized the proposed MAGE-Net and performed a cross-domain evaluation in automotive setting with 10 participants. We observed that the gaze region error using MAGE-Net for interior regions of car is 15.61 cm and 15.13 cm in x and y directions respectively. We utilized these results and demonstrated the capability of proposed system to detect visual distraction using a driving simulator. [Paper link: 10.1145/3490100.3516463; 10.1145/3490100.3516467 ]
[Dataset link: https://github.com/lrdmurthy/PARKS-Gaze]
In automotive, usage of electronic devices increased visual inattention of drivers while driving and might lead to accidents. It is often challenging to detect if a driver experienced a change in cognitive state requiring new technology that can best estimate driver's cognitive load. In this paper, we investigated the efficacy of various ocular parameters to estimate cognitive load and detect cognitive state of driver. We derived gaze and pupil-based metrics and evaluated their efficacy in classifying different levels of cognitive states while performing psychometric tests in varying light conditions. We validated the performance of our metrics in simulation as well as in-car environments. We compared the accuracy (from confusion matrix) of detecting cognitive state while performing secondary task using our proposed metrics and Machine Learning models. It was found that a Neural Network model combining multiple ocular metrics showed better accuracy (75%) than individual ocular metrics. Finally, we demonstrated the potential of our system to alert drivers in real-time under critical distractions. [Paper link: 10.1016/j.treng.2020.100008 ]
This work presents a detailed comparison study between Generative Adversarial Networks (GANs) models to assess the efficacy in generating realistic datasets. Focusing on image-to-image translation, the study explores the utilization of synthetic data in enhancing deep learning model performance. Acknowledging difficulty associated with real image datasets preparation, the research emphasizes the importance of transforming synthetic data into realistic representations. After conducting a thorough comparison between three GAN models, CycleGAN is found to be best model among others (Fr´echet Inception Distance (FID) score of 0.001, 50.574 and 63.000 respectively for three distinct datasets), shows its ability to produce realistic images across three distinct case studies. The study demonstrates the significance of synthetic data in mitigating data scarcity and privacy constraints in object detection tasks. By integrating GAN-based image translation, such as in YOLOv8 training, improvements of 56.8% in mAP@50 and 14.4% in F1 score accuracy is observed in three different cases. [Paper link: 10.1109/ICVTTS62812.2024.10763945]
Mixed reality applications in industrial contexts necessitate extensive and varied datasets for training object detection models, yet actual data gathering may be obstructed by logistical or cost issues. This study investigates the implementation of generative AI methods to work on this issue for mixed reality applications, with an emphasis on assembly and disassembly tasks. The novel objects found in industrial settings are difficult to describe using words, making text-based models less effective. In this study, a diffusion model is used to generate images by combining novel objects with various backgrounds. The backgrounds are selected where object detection in specific applications has been ineffective. This approach efficiently produces a diverse range of training samples. We compare three approaches: traditional augmentation methods, GAN-based augmentation, and Diffusion-based augmentation. Results show that the diffusion model significantly improved detection metrics. For instance, applying diffusion models to the dataset containing mechanical components of a pneumatic cylinder raised the 𝐹1 Score from 69.77 to 84.21 and the 𝑚𝐴𝑃@50 from 76.48 to 88.77, resulting in an 11% increase in object detection performance, with a 67% less dataset size compared to the traditional augmented dataset. The proposed image composition diffusion model and user-friendly interface further simplify dataset enrichment, proving effective for augmenting data and improving the robustness of detection models. [Paper link: https://doi.org/10.1145/3708359.3712163]
Assembling complex components often requires expert guidance. Augmented Reality (AR) offers intuitive visual assistance that can enhance user performance, yet existing AR-based systems largely focus on simple tasks, limiting their application to intricate assembly scenarios. This paper addresses the gap by developing a comprehensive pipeline for AI and AR-based assembly guidance. We analyzed various software architectures, selecting the optimal setup for advanced manufacturing based on latency, accuracy, and human factors. A novel AI-based hologram registration technique was implemented to provide real-time assistance in dynamic environments. Additionally, a multimodal user interface was designed to facilitate seamless interaction with the Mixed Reality (MR) system, allowing users to access instructions efficiently. Results demonstrate that users completed tasks faster with the MR system compared to traditional video-based methods. The proposed system also significantly reduces cognitive load and improves usability, positioning it as an effective tool for modern manufacturing. [Paper link: https://doi.org/10.1007/s44430-025-00005-1]
Tuberculosis (TB) remains a significant global health burden, especially in low-resource settings. This study introduces an AI-powered TB assistant using the fi-ne-tuned LLaVA-v1.5 model to enhance early TB detection through multimodal inputs—chest X-ray images and clinical metadata (age, gender). The system, developed based on clinician feedback, addresses diagnostic delays and limited patient engagement by generating natural language diagnostic reports. Leveraging parameter-efficient LoRA fine-tuning and a balanced dataset of 704 samples, the model achieved a modest accuracy (BERT score of 0.83) with minimal overfit-ting. A responsive web-based interface ensures user-friendly deployment, facilitating real-time analysis and decision support. Validation by medical professionals confirms its potential for preliminary screening in underserved regions. The proposed approach complements clinical workflows, offering scalable support for TB triage, awareness, and treatment adherence. Future improvements will in-corporate patient history and real-time feedback to enhance diagnostic personalization and clinical applicability. [Paper link: CVIP 2025]
Skin diseases affect over 1.8 billion people globally, disproportionately burdening underserved regions due to limited access to dermatological care. Addressing this challenge, we propose SkinGuard AI, an integrated AI-driven mobile application for early detection and assessment of skin conditions. The system com-bines visual analysis using a novel object detection model—DYN-LO, which fuses global context understanding of DINOv2 with localization efficiency of YOLOv8 — with contextual interpretation via a fine-tuned small language model (Phi-2). The dual-modality input enhances diagnostic accuracy, enabling holistic evaluation by considering both image features and user-reported symptoms. Evaluated on a dataset of eight prevalent South Asian skin conditions, DYN-LO achieved an F1-score of 0.308, outperforming six YOLO baselines. The fine-tuned Phi-2 generated medically coherent reports with high BERT (0.8857), COMET (0.8855), and low perplexity (3.79) scores. Together, the system offers real-time, interpretable feedback with guidance on care and over-the-counter remedies, democratizing dermatological support. Designed for mobile deployment, SkinGuard AI holds promise for rural and low-resource settings. Future work includes expanding disease coverage, enabling longitudinal tracking, and integrating multimodal and 3D inputs for richer diagnostics. This study underscores the transformative potential of lightweight AI systems in improving dermatological healthcare access and outcomes. [Paper link: CVIP 2025]
Timely diagnosis of plant diseases is vital for minimizing crop loss and improving productivity, especially for smallholder farmers in resource-constrained regions. This work introduces FarmaFriend, an end-to-end mobile-ready solution that combines an enhanced YOLOv8x-based object detection model with a Lo-RA-tuned Small Language Model (SLM) for real-time plant disease identification and contextual report generation. The vision module, FarmaYOLOx, integrates Normalized Attention Module (NAM), Scale-Transfer Attention (STA), and DySnakeConv, a dynamic deformable convolution module for robust lesion detection, achieving 78.6% precision and 66.2% mAP@0.5. The Phi-2 SLM generated structured reports with a BERT Score of 0.8934. A React-based UI enables seamless user interaction, even in offline environments. Unlike isolated visual or language models, FarmaFriend bridges both modalities, empowering farmers with accessible, interpretable disease diagnostics. Future work will focus on drone-based deployment, expanding disease classes, and integrating weather and soil data to deliver hyperlocal crop care solutions at scale. [Paper link: PReMI 2025]