Experimental results

RQ2

Benchmark

We selected a benchmark consisting of 450 programs from the LLM Generated Dataset, which are of medium size and can be fully instrumented. Additionally, we conducted a detailed analysis of 28 programs from the official LeetCode solutions. This table below record which program we used from Leetcode.

Component-wise Performance

RQ4

Benchmark

Debugging Performance

Page updated

Google Sites

Report abuse