Our anonymous code is available at GitHub
This is the repo to reproduce the results in Look before you leap.
We split the code into two parts: the analysis for open-source models and the analysis for closed-source models.
utils.py provides common utils for both parts.
eval_utils.py provides evaluation utils for both parts.
code_analysis provides necessary utilitis from https://github.com/bigcode-project/bigcode-evaluation-harness to evaluate code generation tasks.
code_result_analysis.py and nlp_result_analysis.py are used to help the analysis process.
Please refer to the directory under open_source_analysis.
code_evaluation.py: Evaluate through the code generation datasets.
llm_evaluation.py: Evaluate through the NLP tasks.
uncertainty_perturbation.py: Perturbation-based uncertainty computation.
uncertainty_sample.py: Sample-based uncertainty computation.
Typically, we first run task evaluation code followed by uncertainty computation code. Please refer to each script for example configurations.
Please refer to the directory under openAI_analysis, especially the code under GPT3_utils.py.
Once the sampled/perturbed examples are collectedm both open-source models and close-source models can be analzed together using the functions in utils.py, such as compute_VR and compute_VRO.