We have conducted an analysis to quantify the occurrence of glitch tokens in real-world scenarios, with results presented in the table below. The three real-world datasets analyzed comprise over seven hundred million tokens. From a macro perspective, more than 2% of the tokens across models and datasets are identified as glitch tokens, indicating that their presence is not merely incidental in these datasets.Â