For the defense, Bergeron exhibits comparatively better performance, and we also underscore the need for a uniform baseline in jailbreak detection. We will list the detailed results in this page.
Performance of defense on three models. Note: For readability, we intentionally enlarged the size of the labels for the best-performing items (top-left corner)