Latent Imitator - Supplementary Materials

Experiments And Reproduce

We here show more detailed results of settings and experiments in our paper.

Code is available here. [code.zip ] or [github]

Experimental setup (Section 4.1.2)

We here show the training results of the tested models (DNN, RF, SVM). The DNN model is a six-layer fully-connected neural network, with architecture [64, 32, 16, 8, 4, 2]. The RF consists of 100 trees, and the SVM is implemented with an RBF kernel.

To ensure fair comparisons, we adopt the same approach as six state-of-the-art methods [5, 26, 64, 70, 72, 73], which involves training on the full dataset and saving the models at the final epoch.

The accuracy of binary classification models

More evaluation results of RQ1 (Section 4.2)

We here show the results of testing SVM models. For SVM models, we also achieve good performance, especially on the Credit and Meps datasets.

The effectiveness and efficiency of different generation methods on different datasets against SVM models.

More evaluation results of RQ2 (Section 4.3)

We here show the naturalness of discriminatory instances generated for RF and SVM models. On all datasets, the ATN values of our method are significantly higher than that of other baselines, indicating that the discriminatory instances generated by our LIMI enjoy better naturalness.

The naturalness of discriminatory instances generated for RF models. Results are shown in ATN.

The naturalness of discriminatory instances generated for SVM models. Results are shown in ATN.

Other naturalness evaluation metrics (Section 4.1.4)

The new naturalness metrics can be categorized into two types: classifier-based and distance-based.

(1) Classifier-based evaluation assesses the difficulty of distinguishing between real (natural) and synthetic data. We utilize the Area Under the Receiver Operating Characteristic Curve (AUC) as a metric to quantify naturalness, where a lower AUC value indicates better naturalness. A lower AUC signifies that the classifier struggles to differentiate between the generated and real data, indicating a higher level of naturalness in the generated samples. The classifiers used in our study are Logistic Regression and Support Vector Classification (SVC), and we denote their respective AUC as LogisticRegression(AUC) and SVC(AUC).

We randomly sample an equal number of generated instances as the real dataset and merge them with labels indicating their authenticity. The resulting dataset is split into training and validation sets, on which we train the classifiers Logistic Regression and Support Vector Classification (SVC) and validate their performance by computing the area under the receiver operating characteristic curve (AUC). We repeat this process thrice and report the average AUC score across all validation sets.

(2) Distance-based naturalness evaluation metric computes the average euclidean distance between the generated data and their nearest neighbor on the original data, denoted as Average Distance. This metric is commonly used in evaluating the naturalness of generated data because natural data tends to be close to the original point. Therefore, a smaller average distance indicates more natural and realistic generated data.

Detailed naturalness evaluation results under these metrics in RQ2 (Section 4.3)

Here, we present additional naturalness measures for the discriminatory instances generated for DNN models.

(1) As for classifier-based measures (i.e., Logistic Regression in Table 5 and SVC in Table 6), LIMI presents the lowest AUC on average over all datasets, indicating that the discriminatory instances generated by our LIMI are more difficult to be distinguished by the detector (i.e., the generated instances of LIMI are more natural).

(2) As for distance-based measures (i.e. Average Nearest Neighbor Euclidean distances), LIMI also achieves better naturalness on average, especially on the Adult, Bank, and Meps datasets.

Results of other models (i.e., RF) (In RQ3, Section 4.4)

We here show the results of using the selected instances and the original dataset to retrain RF models. On average (shown in red), LIMI achieves effective improvements on both individual fairness and group fairness.

The fairness improvement experiments via retraining RF models.

Combination of multiple protected attributes against DNN models (Section 6)

The detailed results of testing in multiple protected attributes (gender&race, gender&age, race&age on Adult; gender&age on Credit) are shown below. Our LIMI still outperforms others.

Combination of multiple protected attributes against DNN models.

Page updated

Google Sites

Report abuse