On this page, we show more examples of how the improvement actually takes place.
For each case, there are four key parts:
Prompt: The original, un-mutated prompt used to query the model.
Oracle: The ground truth provided in the dataset we use.
Code1&Code2: One of them is the original completion output, while the other one is the repaired output.
In case1, code1 is the original completion output, and code2 is the close-to-average output.
This is an example of how we repair the completion output. The original output (code1) seems to be broken and can not reach the target goal.
In case2, code2 is the original completion output, and code1 is the close-to-average output.
This is an example of how we repair the completion output. The original output (code2) returns a wrong number.
In case2, code1 is the original completion output, and code2 is the close-to-average output.
Although both code1 and code2 fail to be exactly the same as the oracle, we can see that code2 is a better choice for users.
In case4, we show an example where the original completion output is the close-to-average output. In other words, the original output is returned as the repaired completion output.
Code2 is the repaired output.
Code2 is the repaired output.
Code2 is the repaired output.
Code1 is the repaired output.
Code2 is the repaired output.