In order to verify whether our findings in RQ1 exist in other LLMs, two additional LLMs, namely Qwen-7B-Chat model and Mistral-7B-Instruct model, are selected to complement our empirical study.
In addition to the visualization methods mentioned above, we conducted a series of experiments on the Mistral and Qwen models to quantitatively demonstrate this phenomenon. The experimental approach is identical to that used for Llama2 in RQ1. Here are the results of the experiments:Â
Through the experiments, we identify similar differences exhibited between normal and glitch tokens at the intermediate layers of different LLMs.