We design experiments to show the effectiveness of our methods on the image area. We designed our experiments on the MNIST dataset. We add a box in the top left corner and use different colors to simulate gender bias. We pick data of two labels “0” and “9” from the MNIST, and we set 20% of “0” samples as white boxes and 80% of "9" samples as white boxes. For every instance, we change its box color as its counterpart. If the classifications of the original instance and its counterpart are inconsistent, we claim it as unfair. After repairing, the RR is 100% and the acc is 99.3%. It’s consistent with our conclusion that we can repair most of the unfair samples with subtle degradation of accuracy. We design a new metric to evaluate the repair effectiveness too. We directly calculate the logit difference between the original instance and its counterpart to represent the unfairness. The lower the value, the fairer the model. Experiments demonstrate that the value decreases from 0.627 to 0.045 after repairing.
We also use the background color to simulate gender attributes. We also set 20% of “0” samples as white background and 80% of “9” samples as white background. We found that the representations of middle layers contain so much background information that after 1 epoch of retraining the background classifier can reach 100% accuracy. We use the logit difference for every instance and its counterpart to evaluate fairness improvement. The logit difference is reduced from 0.595 to 0.063.
For both experiments above, we also set (k, t) as (0.3, 0.2), and (lb, ub) as (-0.05, 0.05).
Because previous work can’t be applied directly to generate more unfair test cases, our evaluation on the image area is only based on RR(repair rate). We cannot directly adopt previous work as a fairness testing method. These experiments enlighten us to develop testing methods for such bias phenomena as our future work.
The dataset we constructed to contain color box bias.
The dataset we constructed to contain background color bias.