Initial formulations of group and individual fairness.
Group fairness measures.
Measures: Current popular papers used fairness measures based on statistical parity between protected groups (e.g. gender, race) in each outcome class. According to statistical parity, a classifier is fair if there are equal proportions of each protected group in each outcome class.
Problems: One problem that is often raised for group fairness measures is that they are only suited to a limited number of coarse-grained, prescribed protected groups. They may miss unfairness against people as a result of their being at the intersection of multiple kinds of discrimination or groups which are not (yet) defined in anti-discrimination law but may need protecting.
Individual fairness measures.
Measures: In light of the problems for group fairness, many researchers have turned to a different paradigm, known as individual fairness (IF). Call this principle similar treatment. It is related to Aristotelian consistency principles, familiar from ethics and philosophy of law, which require that like cases be treated alike. Proponents of individual fairness consider similar treatment to be the intuitive definition of fairness. They seek to offer a precise mathematical treatment of the principle.
Could we remove protected attributes so that the model could be individually fair?
In fact, the evaluation of individual fairness requires protected attributes as explicit input. The evaluation of individual fairness requires directly not causing prediction difference after adjusting the protected attributes. If protected attributes are not included in input features, we can't manipulate protected attributes to calculate individual fairness.
Is individual fairness necessary for fairness evaluation? Or group fairness is enough?
Now let's assume that we try to build an income classifier based on DNN models. The individual’s gender could be jointly inferred by the height attribute and hair length attribute easily. Now if we remove the gender attribute from training attributes and the training data suffers from unbalanced sampling(e.g. 90% of men have higher income and 90% of women have lower income), the model is still likely to capture the spurious correlation between gender information(e.g. height and hair length) and income. But we expect the trained model to capture real critical attributes(i.g. education level) to classify income. If a tall individual with short hair who is usually male always gets a higher income than a short individual with long hair in spite of their education levels, it’s obviously unfair. But it’s very hard to expose such unfairness if we don’t have explicit gender labels as input because we can’t accurately divide attributes into gender-relevant ones and task-relevant ones. Individual fairness is designed to expose such unfairness, and it requires explicit gender labels as input.