Experiments And Reproduce
We here show more detailed results of settings and experiments in our paper.
Code is available here. [code.zip ]
We here show more detailed results of settings and experiments in our paper.
Code is available here. [code.zip ]
Here, we show the Overall 𝐴𝐶𝐶 for LLaMA models, and full debiasing results of the nine protected attributes in the BBQ dataset for the LLaMA-2-Chat 13B, LLaMA-3-Instruct 8B, and LLaMA-3.1-Instruct 8B models.
Across nine protected attributes, our approach eliminates bias more effectively than other baselines, achieving fairness improvements of 74.77% for $s_{\text{DIS}}$ and 75.70% for $s_{\text{AMB}}$ on LLaMA-2-Chat 13B; 79.04% for $s_{\text{DIS}}$ and 80.39% for $s_{\text{AMB}}$ on LLaMA-3-Instruct 8B; and 58.90% for $s_{\text{DIS}}$ and 72.21% for $s_{\text{AMB}}$ on LLaMA-3.1-Instruct 8B.
Here, we present the results with standard deviations for the ablation of Intervention Layer Number 𝑘 and Intervention Magnitude 𝜆.
1. BiasAsker. Here, we present the dataset introduction.
2. Adult. Here, we present the dataset introduction and its evaluation prompts.
3. Larger Models and LLM Architectures.
Here, we present the results of LLaMA-2-Chat 70B, BERT, and BART. BERT is accessed from https://huggingface.co/google-bert/bert-base-uncased, and BART is accessed from https://huggingface.co/facebook/bart-base.