To better understand the impact of glitch tokens in the model's internal output generation process, we conduct a series of experiments on the Llama-2-7b-chat model. We compare the extracted intermediate results of Llama2 model when processing prompts containing normal tokens and glitch tokens and observe significant disparity in attention patterns and MLP status.
While it may not always be apparent from visual plots, our advanced analysis using Jensen-Shannon (JS) divergence reveals profound differences in how large language models (LLMs) process different types of tokens. JS divergence serves as a powerful tool for comparing probability distributions of attention scores over 0.5, effectively capturing nuances that escape simple visual inspection.
Our meticulous examination has led to discoveries where certain layers and heads within the models show maximum divergence in attention scores between glitch and normal tokens. These pivotal findings include:
These JS divergence approach the theoretical maximum (ln(2) or approximately 0.693), indicating stark differences in the distribution of attention scores. Such high divergence underscores the distinct treatment of glitch and normal tokens by the models, highlighting areas with no statistical overlap in attention distributions.
Also, we compare Mean Absolute Distance to quantitatively measure the spread and coherence of MLP status, revealing structured clustering for normal tokens and scattered, erratic distributions for glitch tokens across layers for normal and glitch tokens.
Take MLP Gate activations as example, the mean distances start at a relatively low value of 0.001 for normal tokens and gradually increase to a maximum of 0.294. This progression suggests a centroid-based clustering in the MLP space as the layers advance, indicating a coherent and structured output distribution typical of normal processing scenarios. In contrast, the MLP Gate for glitch tokens exhibits a more erratic and dispersed pattern. Starting from 0.002, the distances not only start higher than their normal counterparts but also increase more steeply, culminating in a significantly higher maximum of 0.372. This indicates that the MLP status for glitch tokens are chaotic and spread widely across the MLP space, which reflects a lack of coherent structure and suggests anomalies in processing.
Through the experiments, we find that:
Llama2 shows a significant disparity in attention patterns and MLP status when dealing with glitch tokens and normal tokens.
The anomalous intermediate results caused by glitch tokens are not uniformly distributed across all layers of the model but are concentrated and amplified in specific key layers.