We selected a benchmark consisting of 450 programs from the LLM Generated Dataset, which are of medium size and can be fully instrumented. Additionally, we conducted a detailed analysis of 28 programs from the official LeetCode solutions. This table below record which program we used from Leetcode.