In RQ4, using Llama-2-7b-Chat as the representative model, we examine the impact of hyperparameters by analyzing the performance variation of AbnorDetector-Lite and AbnorDetector-Full across multiple tasks under different hyperparameter settings. This study centers on two key aspects: first, the influence of activation features from attention and MLP layers on AbnorDetector-Lite and AbnorDetector-Full's performance across diverse tasks; and second, the identification of optimal hyperparameter configurations for AbnorDetector-Lite and AbnorDetector-Full in various abnormal detection tasks; and third, a sensitivity analysis utilizing ROC curves to evaluate the trade-off between detection rates and false alarms.
Comparative Experimental Results of AbnorDetector with Different Hyperparameter Configurations across Three Abnormal Behavior Detection Scenarios
ROC curves of AbnorDetector-Lite and AbnorDetector-Full on (a) Jailbreak, (b) Hallucination, and (c) Backdoor tasks.