We finally present the performance of all hyperparameter combinations for the three methods discussed in the paper, along with the pooled version, for the KWS dataset.
Below, you will find examples of the explanations generated by each method, compared to the ground truth.
For this dataset, every method achieves perfect or near-perfect results according to AUC. However, we can still explore what insights FF provides in this scenario.
A key finding is that, according to AUC, using noise and zero masking as feature masking strategies yields identical performance. However, when evaluated using faithfulness (FF), zero masking significantly outperforms noise masking—this holds across various FF percentages, though we only present FF top-adapt for brevity. This inconsistency supports our concerns about the usefulness of FF as a metric for explainability systems. To further explore the difference between zero and noise masking, below we examine randomly selected examples from both approaches.
The plots illustrate the importance assigned by each method to different segments. Additionally, the annotated ground truth is highlighted in color, allowing for a direct comparison between the model's attributions and the actual relevant segments.
As shown in the table at the end, the results for FF top-adapt do not align with the observation that both strategies yield very similar explanations. The AUC, on the other hand, is 1.0 for all examples and all methods.
Zero Masking Noise Masking