To answer RQ1, we study the effectiveness of our launched imitation attack on three code-related tasks: CSyn, CT, and CSum. As noted inthe paper, we use the proxy dataset to launch queries toward the target LLM and validate the performance of the imitation model on the reference dataset.
From the table, it is evident that the imitation attacks are highly effective. The performance of CSyn and CT tasks is quite promising, as the imitation models outperform models trained on proxy datasets by an average of 57.59% and 56.62% on CodeT5 and CodeBert, respectively. Moreover, it gains competitive performance toward LLM APIs.
Upon manual inspection, we discover that the APIs tend to return verbose contents with abundant information if no additional context is provided, leading to a decrease in performance.
This phenomenon has been previously mentioned in previous work, and we leave it to future work to explain it and propose to solve it.
Extracting specialized abilities of LLMs through medium-sized backbone models is effective for representative code-related tasks. The trained imitation models achieve comparable, if not better performance than the original LLMs in those specialized abilities