Experiments And Reproduce
We here show more detailed results of settings and experiments in our paper.
We here show the training results of the tested models (DNN, RF, SVM). The DNN model is a six-layer fully-connected neural network, with architecture [64, 32, 16, 8, 4, 2]. The RF consists of 100 trees, and the SVM is implemented with an RBF kernel.
To ensure fair comparisons, we adopt the same approach as six state-of-the-art methods [5, 26, 64, 70, 72, 73], which involves training on the full dataset and saving the models at the final epoch.
We here show the results of testing SVM models. For SVM models, we also achieve good performance, especially on the Credit and Meps datasets.
We here show the naturalness of discriminatory instances generated for RF and SVM models. On all datasets, the ATN values of our method are significantly higher than that of other baselines, indicating that the discriminatory instances generated by our LIMI enjoy better naturalness.
The new naturalness metrics can be categorized into two types: classifier-based and distance-based.
(1) Classifier-based evaluation assesses the difficulty of distinguishing between real (natural) and synthetic data. We utilize the Area Under the Receiver Operating Characteristic Curve (AUC) as a metric to quantify naturalness, where a lower AUC value indicates better naturalness. A lower AUC signifies that the classifier struggles to differentiate between the generated and real data, indicating a higher level of naturalness in the generated samples. The classifiers used in our study are Logistic Regression and Support Vector Classification (SVC), and we denote their respective AUC as LogisticRegression(AUC) and SVC(AUC).
We randomly sample an equal number of generated instances as the real dataset and merge them with labels indicating their authenticity. The resulting dataset is split into training and validation sets, on which we train the classifiers Logistic Regression and Support Vector Classification (SVC) and validate their performance by computing the area under the receiver operating characteristic curve (AUC). We repeat this process thrice and report the average AUC score across all validation sets.
(2) Distance-based naturalness evaluation metric computes the average euclidean distance between the generated data and their nearest neighbor on the original data, denoted as Average Distance. This metric is commonly used in evaluating the naturalness of generated data because natural data tends to be close to the original point. Therefore, a smaller average distance indicates more natural and realistic generated data.
Here, we present additional naturalness measures for the discriminatory instances generated for DNN models.
(1) As for classifier-based measures (i.e., Logistic Regression in Table 5 and SVC in Table 6), LIMI presents the lowest AUC on average over all datasets, indicating that the discriminatory instances generated by our LIMI are more difficult to be distinguished by the detector (i.e., the generated instances of LIMI are more natural).
(2) As for distance-based measures (i.e. Average Nearest Neighbor Euclidean distances), LIMI also achieves better naturalness on average, especially on the Adult, Bank, and Meps datasets.
We here show the results of using the selected instances and the original dataset to retrain RF models. On average (shown in red), LIMI achieves effective improvements on both individual fairness and group fairness.
The detailed results of testing in multiple protected attributes (gender&race, gender&age, race&age on Adult; gender&age on Credit) are shown below. Our LIMI still outperforms others.