We establish a set of proxy tasks 𝑆 = {repetition, spelling, length} to assess an LLM’s ability to comprehend tokens. The tasks involve: reproducing a token (repetition), spelling it with hyphens (spelling), and calculating the character count (length). For example, for the token string ‘Hello’, the expected outcomes are ‘Hello’ for repetition, ‘H-e-l-l-o’ for spelling, and ‘5’ for length. A token 𝑡 is deemed a glitch token if the LLM fails any of the three tasks. To investigate this, we prepare a series of prompts for RQ1, detailed in the table below. To circumvent LLM safety mechanisms and ensure task alignment, we employ direct task instructions with positive affirmations, such as ‘Of course! Here is the repeated string:’ for repetition, and ‘Sure! The spelling of this string is:’ for spelling. Additionally, to address potential ambiguities in tokens without alphabetic characters, we use specific few-shot prompts, as listed in the table below. This approach allows us to systematically explore the varied and unexpected responses of LLMs to different glitch tokens.