Although RL is a learning strategy instead of a data model, it dramatically influences the data requirement and control performance. Therefore, papers involving RL are summarized here. In general, RL copes with high-level tasks by exploring the environment and exploiting data collected in the exploring process. This strategy trains an agent for complicated tasks and requires a long learning time.
Before NN is normally used as the common agent in RL, Q-table and statistical models are viable choices. With the help of NN, RL does not only achieve simple tasks like position reach and trajectory following but also aims to solve some complex issues like gait design. Various unique RL strategies, like actuation and state space discretization, are applied to take advantage of soft robots and cope with some modeling and control difficulties. One of the most considerable challenges of RL is exploring the real world, which has a high time cost and may damage robots. Therefore simulation is utilized in the training process.
Compared with other methods, RL requires the most enormous amount of data. More critically, a predefined agent and interaction with the environment are necessary. However, following such a high cost, RL is able to fulfill complex and high-level tasks. With some adaptation strategies like discretization and simulation transfer learning, the time and resource costs can be reduced to some extent.
Table 5. Reinforcement Learning Paper Comparison.
Papae list:
A GPR model named the Gaussian process temporal difference method is employed to control a soft arm.
As the RL agent, a GMM is trained to estimate robot shape and contact.
For robotic catheter control inside a narrow tube, a joint probability distribution is learned considering various variables like tip and entrance points, touch state, and action.
This paper exploits Q-learning for many sophisticated control tasks like turning a handwheel, unscrewing a bottle cap, drawing a line with a ruler, etc.
A soft snake robot is controlled to move on the ground, arrive at target positions, and avoid obstacles with an NN as the agent.
The authors fuse the visual and shape information with NN in RL and control a flexible endoscopy to navigate.
In RL, the workspace is discretized into a 3D grid with a resolution of 0.01 m.
The soft robot in this work is able to keep the end position invariant while changing the orientation with the help of RL.
Constant curvature (CC), a soft robot modeling method, provides a simulation environment at first, and the NN agent continues to learn in the real world using the Deep Deterministic Policy Gradient method (DDPG).
LSTM is utilized for forward modeling of segmented pneumatic robots. Then, RL agents are trained and validated in reality.