We evaluate D-LLM on four open-source LLMs: Llama-2-7b-chat, Mistral-7b-instruct, Llama-3-8b-instruct and gemma-2-9b-it, and compare against four baselines. The comprehensive result is shown as follows:
We record the training time and the selection time of different size of LLMs and present it as follows. It is shown that the training and selection is efficient with a total of three and a half hours for lightweight LLMs, and less than 4 days for such a huge model like Llama-2-70B.
We investigate the scheme's effects and evaluate the influence of the coefficient of ∆W on D-LLM.