In this RQ, our goal is to validate the effectiveness of our model-based analysis in identifying abnormal behavior across three trustworthiness perspectives and explore the relationship between semantic-wise metrics and analysis performance.
Since semantic-wise metrics are crafted to gauge the abstract model's quality from a semantic standpoint. We anticipate a strong correlation with the task's performance, aiming to confirm this hypothesis in our investigation. This inquiry not only serves as a performance check but also aids in pinpointing optimal abstract models for analysis, a process further supplemented by examining the correlation between model-wise metrics and performance for model selection.
In the following visual analysis, we present two comprehensive plots, each encapsulating crucial aspects of our investigation through a series of subplots dedicated to three distinct tasks.
The first plot delves into the realm of abstract model-wise metrics, showcasing the minimum, mean, and maximum values of each metric across the three tasks in the form of intuitive radar plots. This visualization serves as a crucial tool for discerning the variability and central tendencies of these metrics, aiding in a holistic understanding of the abstract models' performance and characteristics.
The second plot shifts the focus to semantic metrics, following a similar structure of presenting the minimum, mean, and maximum values through radar plots for each task. This enables a direct comparison and correlation analysis between the semantic quality of the abstract models and their performance across the tasks, providing valuable insights for model selection and performance optimization.
This plot is about the relationship between the best setting’s metrics and ROC AUC, illustrated through a bar chart of Pearson coefficients. Each bar in this chart represents a different metric, with the length indicating the strength and direction of its correlation with ROC AUC. This visual aid is instrumental in discerning which metrics have a more pronounced influence on the model's performance, guiding us in understanding the dynamics between different performance aspects and ROC AUC.