When Neural Code Completion Models Size up the Situation: Attaining Cheaper and Faster Completion through Dynamic Model Inference

Abstract

Leveraging recent advancements in large language models, modern neural code completion models have demonstrated the capability to generate highly accurate code suggestions. However, their massive size poses challenges in terms of computational costs and environmental impact, hindering their widespread adoption in practical scenarios. Dynamic inference emerges as a promising solution, as it allocates minimal computation during inference while maintaining the model's performance. In this research, we explore dynamic inference within the context of code completion. Initially, we conducted an empirical investigation on GPT-2, focusing on the inference capabilities of intermediate layers for code completion. We found that 54.4% of tokens can be accurately generated using just the first layer, signifying significant computational savings potential. Moreover, despite using all layers, the model still fails to predict 14.5\% of tokens correctly, and the subsequent completions continued from them are rarely considered helpful, with only a 4.2% Acceptance Rate. These findings motivate our exploration of dynamic inference in code completion and inspire us to enhance it with a decision-making mechanism that stops the generation of incorrect code. We thus propose a novel dynamic inference method specifically tailored for code completion models. This method aims not only to produce correct predictions with largely reduced computation but also to prevent incorrect predictions proactively. Our extensive evaluation across various settings showcases the potential of the proposed method. On average, it can skip 1.7 layers out of 16 layers in the models, leading to an 11.2% speedup during the completion generation with only a marginal 1.1% reduction in ROUGE-L.

Cases

During the annotation in RQ3, the annotators found that Stop can let many completions, that would have been discarded by the user if completely generated, be accepted.

Here, we demonstrate several cases for this finding, where after preventing the erroneous part by Stop, the retained partial completion can still be helpful to the developers, thus being accepted.

Case 1

private final void clearStrings() {

// Clear the string values

m_appParam = null;

m_comment = null;

m_fullName = null;

m_homeDir = null;

In this case, the SEC-enhanced GPT-2 only generates m_ and then stopped, while the ground truth is m_homeDirDrive = null;

Without sufficient information in the prompt, it is extremely hard for the LCM to generate homeDirDrive.

Case 2

public void initFormValues(CmsSearchReplaceSettings settings) {

m_siteSelect.setValue(settings.getSiteRoot());

m_ignoreSubSites.setValue(new Boolean(settings.ignoreSubSites()));

m_searchType.setValue(settings.getType());

if (!settings.getPaths().isEmpty()) {

m_searchRoot.setValue(settings.getPaths().get(0));

}

if (CmsStringUtil.isNotEmptyOr

In this case, the SEC-enhanced CodeGen generates WhitespaceOnly(settings.get and then stopped, while the ground truth is WhitespaceOnly(settings.getTypes().

The API getTypes is not shown in the prompt, thus the Stop avoids the rest part of the completion which is going to be a random guess.

Case 3

public void setColor(Color aColor) {

if (debugLog()) {

info().log(toShortString() + " Setting color: " + aColor);

}

In this case, the SEC-enhanced CodeGen generates raphics. and then stopped, while the ground truth is raphics.setColor(aColor);.

Similar to Case 2, the API setColor is not given in the prompt, thus a Stop avoids the unhelpful part.

Full Results

In RQ2, limited by pages, we only reported the several results by setting four tollerances.

The full results can be downloaded here.

Page updated

Google Sites

Report abuse