Experiment

Base Model Fine tuning

Bert model follows the origin fine tuning way

Train

lang=java

lr=5e-5

batch_size=32

beam_size=10

source_length=256

target_length=64

data_dir=../dataset

output_dir=model/$lang

train_file=$data_dir/$lang/train.jsonl

dev_file=$data_dir/$lang/valid.jsonl

epochs=40

pretrained_model=microsoft/graphcodebert-base ## microsoft/codebert-base ## roberta-base

python run.py --do_train --do_eval --model_type roberta --model_name_or_path $pretrained_model --train_filename $train_file --dev_filename $dev_file --output_dir $output_dir --max_source_length $source_length --max_target_length $target_length --beam_size $beam_size --train_batch_size $batch_size --eval_batch_size $batch_size --learning_rate $lr --num_train_epochs $epochs

Test

beam_size=10

batch_size=128

source_length=256

target_length=64

output_dir=model/$lang

data_dir=../dataset

dev_file=$data_dir/$lang/valid.jsonl

test_file=$data_dir/$lang/test.jsonl

test_model=$output_dir/checkpoint-best-bleu/pytorch_model.bin

python run.py --do_test --model_type roberta --model_name_or_path microsoft/graphcodebert-base --load_model_path $test_model --dev_filename $dev_file --test_filename $test_file --output_dir $output_dir --max_source_length $source_length --max_target_length $target_length --beam_size $beam_size --eval_batch_size $batch_size &

T5 model also follows the origin fine tuning way

python run_exp.py --model_tag codet5_base --task summarize --sub_task java

Data Preprocess

follow the preprocess in the previous related work

Split comment and code by python package javalang
Remove examples that codes cannot be parsed into an abstract syntax tree.
Remove examples that #tokens of code or documents is < 2 or > 512
Remove examples that documents contain special tokens
Remove the #paras in the documentations

epochs of retraining

size of the training parameters

Size of the helpful set

More Experiments for RQ2

Experiment compared to RAG model -- Retro

Retro: train representation link model and result link

Page updated

Google Sites

Report abuse