The overall workflow of GlitchHunter is shown below.
Initially, GlitchHunter constructs the Token Embedding Graph (TEG) using all tokens and their respective embedding vectors. Next, it conducts candidate clustering on the initial TEG to generate potential glitch token clusters. Within each cluster, GlitchHunter conducts a hypothesis test to identify those with glitch tokens. Tokens from these selected clusters are then integrated into an updated TEG. This process concludes one iteration and GlitchHunter continues clustering until the TEG experiences no further updates.Â