Evaluation

Effectiveness

We evaluate D-LLM on four open-source LLMs: Llama-2-7b-chat, Mistral-7b-instruct, Llama-3-8b-instruct and gemma-2-9b-it, and compare against four baselines. The comprehensive result is shown as follows:

Effectiveness on harmful behaviors

Effectiveness on standard benchmarks

Efficiency in Training

We record the training time and the selection time of different size of LLMs and present it as follows. It is shown that the training and selection is efficient with a total of three and a half hours for lightweight LLMs, and less than 4 days for such a huge model like Llama-2-70B.

Ablation Study

We investigate the scheme's effects and evaluate the influence of the coefficient of ∆W on D-LLM.

Page updated

Google Sites

Report abuse