This table presents more detailed experimental results, specifically the total time required by each fairness testing method across different repaired models and attribute combinations. Here, M_van indicates the vanilla model, M_dis indicates the model repaired by retraining with discriminatory instances, M_flip indicates the model repaired by flip-based retraining, M_mt indicates the model repaired by multitask learning, M_Faire indicates the model repaired by the Faire method, and Ours indicates the model repaired by our method. To reduce the impact of randomness, we repeat each experiment ten times and record the average value.
From the table, we can see that GRFT spends significantly less time compared to the baselines. Specifically, the baselines usually require several days, for example, DICE takes an average of 4.9 days to complete the search for a single attribute in the Census dataset whereas GRFT completes the tests in under 200 seconds. Even the relatively faster LIMI requires at least 1,000 seconds. Compared to the vanilla models, the testing methods generally complete the search faster on repaired models, as fewer discriminatory instances are detected in the global phase in fairer models, limiting the number of searches in the local phase. These results indicate that GRFT completes the search process on all models faster than all baselines.
We believe the primary reason for GRFT's high performance is the combination of the GA and the random iterative search algorithm. First, the GA significantly enhances the effectiveness of the global phase, allowing us to adopt an efficient and less complex search strategy in the local phase while maintaining high effectiveness. Second, the use of an unguided, random iterative search in the local phase reduces time costs by avoiding the complex and time-consuming gradient computations required by existing methods and by processing all seeds in batches. In contrast, other baselines require mutations based on the gradient of input pairs, making batch processing a challenge.