FairRepair - Supplementary Materials

Initial formulations of group and individual fairness.

Group fairness measures.
- Measures: Current popular papers used fairness measures based on statistical parity between protected groups (e.g. gender, race) in each outcome class. According to statistical parity, a classifier is fair if there are equal proportions of each protected group in each outcome class.
- Problems: One problem that is often raised for group fairness measures is that they are only suited to a limited number of coarse-grained, prescribed protected groups. They may miss unfairness against people as a result of their being at the intersection of multiple kinds of discrimination or groups which are not (yet) defined in anti-discrimination law but may need protecting.
Individual fairness measures.
- Measures: In light of the problems for group fairness, many researchers have turned to a different paradigm, known as individual fairness (IF). Call this principle similar treatment. It is related to Aristotelian consistency principles, familiar from ethics and philosophy of law, which require that like cases be treated alike. Proponents of individual fairness consider similar treatment to be the intuitive definition of fairness. They seek to offer a precise mathematical treatment of the principle.

Could we remove protected attributes so that the model could be individually fair?

In fact, the evaluation of individual fairness requires protected attributes as explicit input. The evaluation of individual fairness requires directly not causing prediction difference after adjusting the protected attributes. If protected attributes are not included in input features, we can't manipulate protected attributes to calculate individual fairness.

Is individual fairness necessary for fairness evaluation? Or group fairness is enough?

Now let's assume that we try to build an income classifier based on DNN models. The individual’s gender could be jointly inferred by the height attribute and hair length attribute easily. Now if we remove the gender attribute from training attributes and the training data suffers from unbalanced sampling(e.g. 90% of men have higher income and 90% of women have lower income), the model is still likely to capture the spurious correlation between gender information(e.g. height and hair length) and income. But we expect the trained model to capture real critical attributes(i.g. education level) to classify income. If a tall individual with short hair who is usually male always gets a higher income than a short individual with long hair in spite of their education levels, it’s obviously unfair. But it’s very hard to expose such unfairness if we don’t have explicit gender labels as input because we can’t accurately divide attributes into gender-relevant ones and task-relevant ones. Individual fairness is designed to expose such unfairness, and it requires explicit gender labels as input.

Do unprotected attributes really contain protected information in our datasets?

We build DNN models to train on only unprotected attributes to classify protected attributes. We find that on the Census Income dataset, the race classification accuracy reaches 85.2%. On the LSAC and Bank Marketing dataset, the classification accuracies for protected attributes could also reach more than 80%.

Different λ settings in our loss function.

We conduct experiments to explore the impact of different λ settings on our model. Our findings indicate that when λ is set at 0.1 or less, the training process is comparable to that of normal training, and there is only limited improvement in fairness. For example, when we set λ at 0.1 for the Census Income dataset and attribute "a," the best repair rate is less than 0.2, which is significantly lower than that achieved with λ set at 1.0 (i.e., 0.999).

On the other hand, when λ is set at 10.0 or higher, the training process is difficult to converge, and accuracy is significantly impacted, with the model only achieving around 0.6 accuracies when trained for the same epochs. Therefore, we set lambda to 1.0, which allowed us to achieve the best trade-off between fairness improvement and model accuracy.

Comparison with a masking strategy.

We explore the effectiveness of directly masking neurons that are highly relevant to protected attributes in improving fairness. Our findings suggest that masking alone is not sufficient to achieve a significant improvement in fairness. For example, when we applied masking to the Census Income dataset and attribute "a", the best repair rate we achieved was 0.81, which is less effective compared to our proposed method. We believe that this is because the confounding neurons still contain protected information and could still significantly influence model predictions. Our penalty strategy, on the other hand, is more effective in reducing the protected information and causing obfuscation, which, in turn, reduces the influence on the final prediction.

When several protected features are targeted.

Our key insight is to reduce the effect of the protected attributes on the prediction. When there are several protected attributes, we expect to penalize neurons accountable for the protected attribute classification but not accountable for the original classification. We calculate the accountable neurons for both the original classifier f , the first protected attribute classifier f ′, and the second protected attribute classifier f ′′ denoted as ˆpf , ˆpf ′ , and ˆpf ′′ . The neurons of the DNN could be divided into four categories:

(a) Penalized Neurons. At layer l, the penalized neurons are defined as ( ˆs′l ∩ ˆs′′l) \ ˆsl, where ˆs′l and ˆsl are the accountable neurons calculated from f ′ and f , respectively. Intuitively, the neurons, which are accountable for the protected attribute classification but not accountable for the original classification, should have less impact. Thus, we should penalize the outputs of these neurons. We do not penalize all neurons in ˆs′l since some neurons may largely affect the performance of the original model.

(b) Promoted Neurons. On the contrary, the promoted neurons at layer l are defined as ˆsl \ ( ˆs′l ∪ ˆs′′l), which are accountable for the original task but not accountable for recognizing the protected attributes. Thus, the predication should depend more on these neurons and we will promote their output.

(c) Confounding Neurons. There are some neurons that exist in ˆs′l, ˆs′′l, and ˆsl, i.e., ˆs′l ∩ ˆs′′l ∩ ˆsl. These neurons could play important roles in both tasks. Thus, we cannot simply penalize them or promote them.

(d) Non-accountable Neurons. There are also some neurons that do not contribute much for both two tasks, i.e., {n|n ∈ l ∧ n ̸ ∈ ˆs′l ∧ n ̸ ∈ ˆs′′l ∧ n ̸ ∈ ˆsl}. Similarly, it is unclear whether such neurons should be promoted or penalized. However, these neurons tend not to affect the result too much.

Is individual fairness necessarily related to group fairness?

Considering that the evaluation metrics of individual fairness and group fairness are totally different, improved individual fairness doesn’t always mean better group fairness. In fact, in paper [1], they observed the apparent conflict between individual fairness and group fairness. In paper [2], they proposed that there were no conflicting principles between the two fairness concepts. In our paper, we base on previous work and only focus on individual fairness currently.

[1] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference.

[2] Binns, Reuben. "On the apparent conflict between individual and group fairness." Proceedings of the 2020 conference on fairness, accountability, and transparency. 2020.

To measure group fairness, we adopt SPD and AOD metrics following [8]. The Statistical Parity Difference (SPD) requires that a decision should be independent of the protected attributes, which measures the difference in positive classification between different demographic groups:

SP D = |P (ˆy = 1|A = unprivileged) − P (ˆy = 1|A = privileged)|,

where ˆy denotes the model prediction and ˆy is the group of the protected attribute. The Average Odds Difference (AOD) represents the average of the differences in True Positive Rate (TPR) and False Positive Rate (FPR) between privileged and unprivileged groups:

AOD = 1/2 [|FPRA=unprivileged − FPRA=privileged| + |TPRA=unprivileged − TPRA=privileged|]

Under the above definitions, lower fairness metric values indicate a fairer model, while larger values denote a higher level of discrimination in the model.