Task 1: EG outperforms both CG1 (p = 0.0781, r = 0.62) and CG2 (p=0.0078,r=0.94, with statistical significance);
Task 2: EG outperforms both CG1 (p=0.0156,r=0.85) and CG2 (p=0.0234,r=0.80), both with statistical significance;
Task 3: CG2 outperforms both EG (p=0.0390,r=0.72) and CG1 (p=0.0078,r=0.94), both with statistical significance.
Both TRACE (EG) and CoEdPilot (CG1) function as project-wide solutions capable of identifying edit locations across different files. Notably, TRACE outperforms CoEdPilot by invoking LSP-based clone detection tool, enabling efficient cross-file edit propagation.
In contrast, Cursor (CG2) Chat relies on users to specify files, it takes time especially when users are unsure about which files to modify. Additionally, rewriting multiple files with Cursor is time-consuming, potentially breaking users' mental flow.
Task 2 involves 9 edits across 2 files, which are distant but syntactically coherent. Still taking advantage of tool-deduction, TRACE can identify the subsequent edit location via LSP service, avoiding the exhaustive file scanning required by CoEdPilot.
In contrast, Cursor struggles with this task due to its need for full-file rewriting and its limited ability to localize cross-file edits. Take P22 for example, the user had to manually search for edits across files, which ended up in the wrong file. The user kept working on it until the budget is used up.
Given the shared context among 5 edits, Cursor allows users to quickly trigger the next edit recommendation via the Tab key, or generating all correct edits in a single rewrite via Chat, which significantly improves editing efficiency.
Despite lacking edit compositions in this task, TRACE still outperforms CoEdPilot by fewer false positive suggestions with its improved predicting performance.