Code

Our anonymous code is available at GitHub

LLM Uncertainty

This is the repo to reproduce the results in Look before you leap.

We split the code into two parts: the analysis for open-source models and the analysis for closed-source models.

utils.py provides common utils for both parts.

eval_utils.py provides evaluation utils for both parts.

code_analysis provides necessary utilitis from https://github.com/bigcode-project/bigcode-evaluation-harness to evaluate code generation tasks.

code_result_analysis.py and nlp_result_analysis.py are used to help the analysis process.

Open-source models

Please refer to the directory under open_source_analysis.

code_evaluation.py: Evaluate through the code generation datasets.

llm_evaluation.py: Evaluate through the NLP tasks.

uncertainty_perturbation.py: Perturbation-based uncertainty computation.

uncertainty_sample.py: Sample-based uncertainty computation.

Typically, we first run task evaluation code followed by uncertainty computation code. Please refer to each script for example configurations.

Close-source models

Please refer to the directory under openAI_analysis, especially the code under GPT3_utils.py.

Once the sampled/perturbed examples are collectedm both open-source models and close-source models can be analzed together using the functions in utils.py, such as compute_VR and compute_VRO.

Page updated

Google Sites

Report abuse