Title: Learning-based Navigation Framework for Mobile Service Robots
Abstract
A navigation system is essential for a mobile robot to traverse autonomously from one location to another. A service robot performs autonomous tasks in service environments like houses, hospitals, restaurants, museums, care homes, etc. Some common features of a service environment are a large carpet area, static and dynamic obstacles, being prone to environmental changes over time, adding external sensors in the environment are not possible, and the availability of novice users/operators. Path planning sub-system plays a vital role in a mobile robot navigation system. It generates the shortest path from a source to a destination location. Most path planners in the literature assume the environment map to be fully known or unknown. A fully known environment will restrict further environmental changes; an unknown environment would give an inefficient path and take more time to explore the environment. A mobile service robot navigation system should have user-friendly, scalable, repeatable, environment-independent features and lifelong learning capability. This thesis focuses on developing a general navigation framework for an indoor mobile service robot. The proposed navigation framework consists of three main systems. A Learning from Demonstration (LfD) system that converts simple user demonstrations into usable paths, a path mapping and planning system with dynamic obstacle avoidance capability, and a lifelong learning system that enhances the navigation framework to adapt based on environmental changes.
LfD is an end-user development technique for teaching a computer or a robot new behavior by demonstrating the task to transfer directly instead of programming it through machine commands. In contrast to traditional robot programming techniques, these methods do not require specialized technical or programming skills but translate demonstrated behavior immediately into executable code. This has obvious implications for the widespread use of service robots. In this thesis, three LfD approaches, counter-based, encoder-based, and enhanced encoder-based techniques, are proposed for teaching a mobile service robot to navigate from one location to another. The robot is trained to navigate to all the possible destinations sequentially. The trained paths are recorded in the form of a path matrix. This ensures the robot learns a partial number of paths in the environment. The proposed LfD technique has a user-friendly training interface, enabling even a novice user to perform training. During training, the path matrix is generated using the onboard sensors. Hence, the system is environment-independent without using any environmental sensors for training.
The path mapping and planning system can generate all the logical paths between the locations with the partial number of paths learned from the LfD system. A novel Tree Based Pathplaner (TBP) is proposed in this work, which generates a path tree with all the logical connections between all the locations in the environment. Using the path tree, the navigation system can autonomously guide the robot to move from a source to a destination location. Since the environment is learned through the partial paths, the drawback of inefficient paths with exploration time overhead in the unknown environment path planning and preloading the full environment map (which is not possible in some cases) in the known environment path planning is avoided. A state-based obstacle avoidance algorithm is proposed to avoid the obstacle and converge to the planned path.
The service environment is usually vast and unknown, and the robot is expected to operate continuously for a long period. The environment can change with time, leading to the generation of new routes or permanently blocking the old routes. While developing a path planner for a service robot, it is essential to keep the environmental changes as a factor. The traditional path planner that relies on static maps or explores an unfamiliar environment will suffice if the environment remains constant over time. A machine-leaning-based path planner can update itself based on environmental changes dynamically. A Reinforcement Learning (RL) based system with Transfer Learning (TL) enhances the proposed navigation framework with the ability to adapt to a dynamic environment. The proposed system uses Deep Q-Learning (DQL) algorithm to learn the initial paths using a topological map of the environment, i.e., the path tree generated by the TBP system. This work proposes a novel TL algorithm called beta-decay algorithm to achieve lifelong learning. The proposed TL algorithm uses Experience vs. Exploration vs. Exploitation (EEE) factors for transfer learning.
The proposed individual systems were implemented and tested as a standalone system and integrated and tested as a navigation framework. The LfD techniques are implemented and tested in the 2D robotic simulator Player/Stage and an indigenously built mobile robot platform Amrita Autonomous Mobile Robot (AAMoR). The system accuracy was tested in the AAMoR platform. A human operator has taught the robot a particular trajectory. The robot was able to replicate the trajectory autonomously and accurately. The LfD system and TBP path planner are implemented and tested using the Player/Stage simulator to train the partial number of paths in three environments: house layout, hospital layout, and a benchmark maze layout. The test results prove the proposed path planner has scalability, trainability, accuracy, and repeatability. The path planner is compared with various classical path planning algorithms, and the results show that the proposed path planning algorithm is on par with the other algorithms in terms of accuracy and efficient path generation. The efficiency of the state-based obstacle avoidance system is tested successfully using different form factors of obstacles using the Player/Stage simulation environment and AAMoR mobile robot platform. The overall navigation framework with the DQL agent is implemented and tested using the ROS framework with Turtlebot3 mobile robot in the Gazebo simulator. The experiment results show that the proposed RL system learns all the routes based on the initial topological map of three different service environments with an accuracy of over 98\%. The efficiency of the proposed TL algorithm is evaluated in a modified environment. A comparative analysis of the proposed TL and non-TL agents is performed based on various evaluation metrics. The convergence time required for the TL agent is half that of the non-TL agent.
The overall outcome of this thesis work is a navigation framework considering the typical features of an autonomous mobile robot deployed in a service environment. All the test results are promising and real-time implementable. This work will be a major milestone in deploying real-time robots in service applications.
Objectives of the research
The objective of the research work is to design and develop a navigation framework that will be able to,
Achieve environmental awareness based on user training to traverse one path to each possible destination
Map, all the possible unknown logical paths to all the destinations, using learned paths
Perform path planning and navigation for the long-term and to cope with changes in the environment over time