DSBox

A Data Selection Framework for Efficient Deep Code Learning

ASE 2025 Tool Demo

The Workflow Overview of DSBox

DSBox consists of three stages: method selection, data selection, and model training. Given a code model, a set of data under labeling, DSBox uses a data selection method (decided by users) to rank the data samples based on their potential contribution to the model training. Then, given the labeling budget (for example, label 10% of the data), DSBox picks out the budget number of samples based on their ranking. After that, these selected samples will be manually labeled by human annotators. Finally, the labeled data will be utilized as training data to optimize the parameters of code models. Finally, DSBox outputs the accuracy and F1 score of the model in the code classification task.

code: https://github.com/yinheeL/DSBox

Page updated

Google Sites

Report abuse