Efficient task scheduling and execution in heterogeneous multi-robot systems remain challenging due to the complexity of interpreting high-level task instructions, coordinating diverse robot capabilities, and validating task outcomes. Traditional logic-based and learning-based approaches often fall short in dynamic and ambiguous environments. Large Language Models (LLMs) offer a promising solution by leveraging their advanced reasoning, contextual understanding, and adaptability to handle complex task dependencies and interpret multimodal inputs. This work introduces an LLM-driven framework for station identification, task scheduling, and object pick-up validation. The proposed method achieved task scheduling of 80%, while object pick-up validation using few-shot prompting demonstrated reliable performance. These findings highlight the potential of LLMs to improve coordination, adaptability, and reliability in multi-robot systems, paving the way for scalable and intelligent automation solutions.
[Paper link: https://doi.org/10.1145/3708557.3716333]
In shared autonomy, human-robot handover for object delivery is crucial. Accurate robot predictions of human hand motion and intentions enhance collaboration efficiency. However, low prediction accuracy increases mental and physical demands on the user. In this work, we propose a system for predicting hand motion and intended target during human-robot handover using Inverse Reinforcement Learning (IRL). A set of feature functions were designed to explicitly capture users’ preferences during the task. The proposed approach was experimentally validated through user studies. Results indicate that the proposed method outperformed other state-of-the art methods (PI-IRL, BP-HMT, RNNIK-MKF and CMk=5) with users feeling comfortable reaching upto 60% of the total distance to the target for handover with 90% target prediction accuracy. The target prediction accuracy reaches 99.9% when less than 20% of the task remains.
[Paper link: https://doi.org/10.1109/ICRA57147.2024.10610595]
Individuals with Severe Speech and Motor Impairment (SSMI) struggle to interact with their surroundings due to physical and communicative limitations. To address these challenges, this work presents a gaze-controlled robotic system that helps SSMI users perform stamp-printing tasks. The system includes gaze-controlled interfaces and a robotic arm with a gripper, designed specifically for SSMI users to enhance accessibility and interaction. User studies with gaze-controlled interfaces such as video see-through (VST), video pass-through (VPT), and optical see-through (OST) displays demonstrated the system’s effectiveness. Results showed that VST had the average stamping time of 28.45s(SD = 15.44s) and the average stamp count 7.36(SD = 3.83), outperforming VPT and OST. This project was funded by the EU-ITU and the Ministry of Telecommunications, Government of India.
In this study, we employ multiple robots utilizing various interaction modalities to facilitate the pick-up and delivery of diverse objects within warehouse settings. The primary objective is to efficiently transport objects to and from a pick-up autonomous ground vehicle (AGV). Robotic arms are deployed to trace holographic lines for object retrieval and subsequent placement into the AGV. Another robot employs eye gaze technology for object selection and placement into the AGV. Subsequently, the AGV utilizes a lane navigation algorithm to autonomously navigate towards its destination. For the return journey, manual intervention allows the car to reverse its direction and retrace the same lane back to its starting point in autonomous mode. Meanwhile, another AGV relies on point coordinates for navigation, equipped with an integrated collision avoidance system to prevent collisions with other AGVs. Ultimately, both vehicles successfully reach their destinations without encountering any collisions
This work presents a vision-guided, novel, and robust system for the autonomous taxiing of an aircraft in the real world. The system is an ensemble of autonomous navigation and collision avoidance modules. The navigation module detects the lane and sends the control signal to the steer control algorithm. This algorithm uses a controller to help the aircraft follow the central line with a resolution of 0.013 cm. The object detection module in the collision avoidance algorithm was compared with state-of-the-art models on the road object dataset and proved its superiority. In parallel, an airport dataset is proposed, and the object detection model is fine-tuned with it to avoid collision with any ground vehicle. A detailed study is conducted in different lighting conditions to prove the efficacy of the proposed system. It is observed that lane detection and collision avoidance module work with a true positive rate of 92.59% and 85.19%, respectively.
[Paper link: https://doi.org/10.3846/aviation.2023.20588]
Numerous research were undertaken to predict pointing targets in Graphical User Interfaces (GUI). This work extends target prediction for Extended reality (XR) platforms through Sampling-based Maximum Entropy Inverse Reinforcement Learning (SMEIRL). The SMEIRL algorithm learns the underlying reward distribution for the pointing task. Results show that SMEIRL achieves better accuracy in both VR and MR (for example 32.60% accuracy in VR and 34.48% accuracy in MR at 30% of pointing task) compared to Artificial Neural Network (ANN) and Quadratic Extrapolation (QE) during early stage of pointing task. For later stage, QE performs better (for example 93.51% accuracy in VR and 95.58% accuracy in MR at 70% of the pointing task) than SMEIRL and ANN. All the three algorithms, SMEIRL, ANN and QE reported higher target prediction accuracy in MR than in VR.
[Paper link: https://dl.acm.org/doi/abs/10.1145/3581754.3584130]
Intent prediction finds widespread applications in user interface (UI/UX) design to predict target icons, in automotive industry to anticipate driver’s intent, and in understanding human motion during human-robot interactions (HRI). Predicting human intent involves analyzing factors such as hand motion, eye gaze movement, and gestures. This work introduces a multimodal intent prediction algorithm involving hand and eye gaze using Bayesian fusion. Inverse reinforcement learning was leveraged to learn human preferences for the human-robot handover task. Results demonstrate that the proposed approach achieves the highest prediction accuracy of 99.9% at 60% task completion as compared to state-of-the-art (SOTA) methods.
[Paper link: https://dl.acm.org/doi/abs/10.1145/3640544.3645229]