User study 🗣️

Overall performance

Demographic analysis

Editing tasks

Overall performance

User behaviour analysis

Observations

User study results

Why did TRACE and CoEdPilot outperform Cursor in Task 1?

Why did TRACE outperform both CoEdPilot and Cursor in Task 2?

Why did Cursor outperform TRACE and CoEdPilot in Task 3?

User study results

Tables

Task 1: EG outperforms both CG1 (p = 0.0781, r = 0.62) and CG2 (p=0.0078,r=0.94, with statistical significance);
Task 2: EG outperforms both CG1 (p=0.0156,r=0.85) and CG2 (p=0.0234,r=0.80), both with statistical significance;
Task 3: CG2 outperforms both EG (p=0.0390,r=0.72) and CG1 (p=0.0078,r=0.94), both with statistical significance.

Why did TRACE and CoEdPilot outperform Cursor in Task 1?

Both TRACE (EG) and CoEdPilot (CG1) function as project-wide solutions capable of identifying edit locations across different files. Notably, TRACE outperforms CoEdPilot by invoking LSP-based clone detection tool, enabling efficient cross-file edit propagation.

In contrast, Cursor (CG2) Chat relies on users to specify files, it takes time especially when users are unsure about which files to modify. Additionally, rewriting multiple files with Cursor is time-consuming, potentially breaking users' mental flow.

Why did TRACE outperform both CoEdPilot and Cursor in Task 2?

Task 2 involves 9 edits across 2 files, which are distant but syntactically coherent. Still taking advantage of tool-deduction, TRACE can identify the subsequent edit location via LSP service, avoiding the exhaustive file scanning required by CoEdPilot.

In contrast, Cursor struggles with this task due to its need for full-file rewriting and its limited ability to localize cross-file edits. Take P22 for example, the user had to manually search for edits across files, which ended up in the wrong file. The user kept working on it until the budget is used up.

Why did Cursor outperform TRACE and CoEdPilot in Task 3?

Given the shared context among 5 edits, Cursor allows users to quickly trigger the next edit recommendation via the Tab key, or generating all correct edits in a single rewrite via Chat, which significantly improves editing efficiency.

Despite lacking edit compositions in this task, TRACE still outperforms CoEdPilot by fewer false positive suggestions with its improved predicting performance.

Page updated

Google Sites

Report abuse