We used five safety checkers and the surrogate classifier in the experiment to evaluate all the prompts (a total of 1104) across three datasets, and counted the number of cases where the surrogate classifier and the target safety checker made different judgments on the prompts and their generated images. The experimental results indicate that there are differences in the decision boundaries between the safety checkers and the surrogate classifier.
We also compared the differences in detection results between different safety checkers for the same prompts, further confirming that the decision boundaries vary between classifiers with different model architectures and training data.