Fairness Mediator - Supplementary Materials

Fairness Mediator

Experiments And Reproduce

We here show more detailed results of settings and experiments in our paper.

Code is available here. [code.zip ]

Experimental setup

Bias Mitigation Baselines (Section 4.1.3)

Social Groups (Section 4.1.4)

More evaluation results of RQ1 (Section 4.2)

Here, we show the Overall 𝐴𝐶𝐶 for LLaMA models, and full debiasing results of the nine protected attributes in the BBQ dataset for the LLaMA-2-Chat 13B, LLaMA-3-Instruct 8B, and LLaMA-3.1-Instruct 8B models.

Across nine protected attributes, our approach eliminates bias more effectively than other baselines, achieving fairness improvements of 74.77% for $s_{\text{DIS}}$ and 75.70% for $s_{\text{AMB}}$ on LLaMA-2-Chat 13B; 79.04% for $s_{\text{DIS}}$ and 80.39% for $s_{\text{AMB}}$ on LLaMA-3-Instruct 8B; and 58.90% for $s_{\text{DIS}}$ and 72.21% for $s_{\text{AMB}}$ on LLaMA-3.1-Instruct 8B.

Ablation Studies (Section 5.2)

Here, we present the results with standard deviations for the ablation of Intervention Layer Number 𝑘 and Intervention Magnitude 𝜆.

Bias Mitigation on Additional Datasets and Models (Section 5.3)

1. BiasAsker. Here, we present the dataset introduction.

2. Adult. Here, we present the dataset introduction and its evaluation prompts.

3. Larger Models and LLM Architectures.

Here, we present the results of LLaMA-2-Chat 70B, BERT, and BART. BERT is accessed from https://huggingface.co/google-bert/bert-base-uncased, and BART is accessed from https://huggingface.co/facebook/bart-base.

Page updated

Google Sites

Report abuse