Code of GlitchHunter is available here.
We present the ground truth of glitch tokens in seven LLMs collected in the empirical study here.
We present eight embeddings of open-source LLMs: GPT2-small, GPT2-xl, Llama2-7b-chat, Llama2-13b-chat, ChatGLM-6b, ChatGLM2-6b, Mistral-7b-Instruct, Vicuna-13b. The data and the usage of .h5ad file is described here.