RQ2: How do AI controllers perform in various robotics manipulation tasks?
The findings from RQ1 highlight the necessity of an industrial-level benchmark that aids users in initiating and familiarizing themselves with the utilization of Isaac Sim for the development of AI-enabled robotics applications. To address this requirement, with our industrial partners, we develop a benchmark comprising eight typical robotics manipulation tasks that fully cover rigid body manipulation, deformable object manipulation, prehensile manipulation, and non-prehensile manipulation, using Isaac Sim.
In the meantime, despite extensive research on DRL controllers in robotics, there is still a lack of systematic and detailed analysis of their performance in diverse robotics manipulation tasks. This knowledge gap emphasizes the need to compare and evaluate different DRL controllers under various task requirements. Therefore, we design RQ2 that aims to compare different DRL controllers across a wide range of manipulation tasks. For this purpose, we propose various metrics to evaluate and examine the performance of different DRL controllers on the tasks presented in our benchmark. The result reveals the strengths and limitations of different AI controllers in robotics manipulation and allows researchers to identify the most effective methods for achieving specific control objectives.
Evaluation Metrics
To study RQ2, we perform the evaluation based on four categories of evaluation metrics that consider different aspects of the manipulation tasks:
Success Rate (SR): This metric measures the percentage of successful task completions among all attempted trials. Successful task completion is defined as when the controlled system behavior satisfies a predefined task-relevant STL specification (see the following Table for the STL specifications used in our benchmark). A high SR indicates that the AI controller is capable of effectively completing the task.
Dangerous Behavior Rate (DBR): DBR is computed as the percentage of time steps, i.e., control intervals, where the manipulator is close to failing the task, among all simulated time steps. A high DBR indicates that the AI controller is prone to generating unsafe or unstable control commands, which can potentially cause task failures. The STL specifications used in our benchmark for describing dangerous behaviors are also given in the following Table.
Task Completion Time (TCT): This metric measures the time steps needed by the manipulator to successfully complete the task. A shorter TCT indicates that the AI controller is capable of generating more efficient control commands to complete the task.
Training Time (TT): We use TT to measure the training time steps required to train the AI controller, i.e., the policy reward converges to a constant level. A shorter training time indicates that the used DRL algorithm is more efficient in learning the control policy for the given task.
In real-world applications, system noises and uncertainties, such as sensor noises or model inaccuracies, are often inevitable. Thus, the ability of manipulators to remain robust under such conditions is crucial, especially for those designed for real-world tasks. Therefore, we include robustness as our final critical evaluation metric. Specifically, to assess the robustness of AI controllers, we introduce action noises to the trained AI controllers and measure how the SR, DBR, and TCT are impacted.
STL Specifications
Experiment Settings
To evaluate the performance of AI-enabled robotics manipulation, we first use various DRL algorithms, i.e., TRPO, PPO, SAC, TD3, and DDPG, to train multiple AI software controllers for each manipulation task. However, our experiments reveal that except for TRPO and PPO, the other DRL algorithms fail to produce a working controller that is capable of solving the manipulation task. One potential explanation for this could be that, compared to off-policy algorithms, on-policy algorithms are better suited for Isaac Sim, which employs parallel running of a large number of environments. In such a case, the collected samples are strongly correlated in time, and a considerable amount of information is accumulated in each time step, posing challenges for off-policy algorithms. Further research is needed to investigate whether this is the root cause of the issue. Consequently, we use only the AI controllers trained by PPO and TRPO in our performance evaluation and the falsification test presented in the next section, as they are the only functional ones. For each task and AI controller, we conduct 100 trials separately under two different conditions: without and with action noise, where the action noise is a white Gaussian noise with a variance of 0.25.
For each trial, the initial configuration is generated randomly according to Table~\ref{table_performance}, and the simulation length is set to 300 time steps. The values of DBR and TCT are averaged among all trails.
Parallel Running of Environments
Experimental Results
The results are presented in the following Table, as well as the radar chart where values are normalized in a way that, a larger value indicates a better performance.
Performance of AI controllers
It can be observed that both TRPO and PPO are able to accomplish most of the manipulation tasks with a SR over 90\%, except for PPO in PH (85\%) and DO (89\%). While TRPO maintains a comparable SR to PPO in tasks like CS, BC, and CP, it has a better performance in other tasks, particularly for those that require precise control (e.g., PH) or a multi-stage control process (e.g., DO). As a result, TRPO generally outperforms PPO in terms of SR. Similar trends can also be observed in the metrics DBR and TCT, where TRPO usually has a better performance with the exception that for the task CP, PPO has a more clear advantage. However, TRPO often requires a longer TT, especially for tasks PR, PH, and BP, where as much as twice of the TT is required compared to PPO.
As expected, the introduction of action noise leads to a decrease in the performance of AI controllers across all metrics. However, both PPO and TRPO are still able to accomplish the tasks with a satisfactory SR, where the decrease is less than 5\% for most of the tasks. Tasks that require accurate execution of actions, such as PH, BB, and DO, exhibit a more noticeable decrease in SR. In terms of the DBR, the action noise has a strong impact on the CS and BB tasks. One potential reason could be that these tasks have less tolerable space for manipulating objects, e.g., the size of the target cube in CS or the tool in BB, making imprecise actions more dangerous. The impact of the action noise on the DBR of other tasks as well as on the TCT is marginal. Overall, both PPO and TRPO show good performance in terms of robustness against action noise.