The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning
Sheila Schoepp, Masoud Jafaripour*, Yingyue Cao*, Tianpei Yang, Fatemeh Abdollahi, Shadan Golestan, Zahin Sufiyan, Osmar Zaiane, Matthew E. Taylor (*Equal contribution)
Abstract: Reinforcement learning (RL) has shown impressive results in sequential decision-making tasks. Meanwhile, Large Language Models (LLMs) and Vision-Language Models (VLMs) have emerged, exhibiting impressive capabilities in multimodal understanding and reasoning. These advances have led to a surge of research integrating LLMs and VLMs into RL. In this survey, we review representative works in which LLMs and VLMs are used to overcome key challenges in RL, such as lack of prior knowledge, long-horizon planning, and reward design. We present a taxonomy that categorizes these LLM/VLM-assisted RL approaches into three roles: agent, planner, and reward. We conclude by exploring open problems, including grounding, bias mitigation, improved representations, and action advice. By consolidating existing research and identifying future directions, this survey establishes a framework for integrating LLMs and VLMs into RL, advancing approaches that unify natural language and visual understanding with sequential decision-making.
Adaptive Iterative Feedback Prompting for Obstacle-Aware Path Planning via LLMs
Masoud Jafaripour, Shadan Golestan, Shotaro Miwa, Yoshihiro Mitsuka, Osmar R. Zaiane
Abstract: Planning is essential for agents operating in complex decision-making tasks, particularly in Human-Robot Interaction (HRI) scenarios, which often require adaptability and the ability to navigate dynamic environments. Large Language Models (LLMs), known for their exceptional natural language understanding capabilities, hold promise for enhancing planning in HRI by processing contextual and linguistic cues. However, their effectiveness is limited by inherent shortcomings in spatial reasoning. Existing LLM-based planning frameworks often depend on combining with classical planning methods or struggle to adapt to dynamic environments, limiting their practical applicability. This paper examines whether the incorporation of an environmental feedback mechanism and iterative planning can enhance the planning capabilities of LLMs. Specifically, we propose the ”Adaptive Iterative Feedback Prompting” (AIFP) framework for path planning. In AIFP, an LLM generates partial trajectories iteratively, which are evaluated for potential collisions using environmental feedback. Based on the evaluation, AIFP executes the trajectory or re-plans. Our preliminary results show that AIFP increases the success rate of the baseline by 33.3% and generates efficient, appropriately complex paths, making it a promising approach for dynamic HRI scenarios.
Optimal Integration of Hybrid FES-Exoskeleton for Precise Knee Trajectory Control
Masoud Jafaripour, Vivian Mushahwar, Mahdi Tavakoli
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, UAE, 2024
Abstract: This paper introduces a novel hybrid torque allocation method for improving wearability and mobility in integrated functional electrical stimulation (FES) of the quadriceps muscles and powered exoskeleton systems. Our proposed approach leverages a hierarchical closed-loop controller for knee joint position tracking while addressing limitations of powered exoskeletons and FES systems by reducing power consumption and battery size and by mitigating FES-induced muscle fatigue, respectively. The core component is a model-free optimization algorithm that dynamically distributes torque between FES and the exoskeleton by considering tracking error, effort, and the prediction of muscle fatigue in the cost function, computing allocation gain in an online manner. The online optimization approach interactively changes the optimal allocation gain by taking into account the instantaneous value of error and effort and also penalizing FES-induced fatigue, a common challenge in long-duration experiments. The results demonstrate that this dynamic allocation significantly improves system wearability by reducing power consumption without increasing muscle fatigue during the extension phase of walking. This hybrid control approach contributes to improving exoskeleton wearability and rehabilitation outcomes for individuals with SCI and mobility impairments, enhancing assistive technology and quality of life.
Link to Full Text