Appendix I: Results on Other Metrics
As noted in our main paper, when assessing the security of existing TSDP solutions, we consider seven metrics. This Appendix section will re-visit the employed Security metrics and then display the results of the other Security metrics.
Accuracy [81] measures how many test samples can be correctly classified by the attacker's surrogate model. Achieving high accuracy is a primary goal of model stealing attacks.
Fidelity [81] is the percentage of test samples with identical prediction between the surrogate model and the victim model, including the samples that are misclassified by the victim model.
Attack Success Rate (ASR) [82] is the percentage that the adversarial samples generated by the surrogate model can successfully mislead the output of the victim model. It measures the transferability of adversarial samples [79], [82]. We use the popular PGD attack [58] to generate adversarial samples. Following [82], we use L-infinity norm, epsilon=0.03, and iteration step of 7 for adding datasets. We adopt the PGD implementation public tools [23].
Generalization Gap [110] is the difference between the average accuracies on the victim model's training dataset and the test dataset. The more a model remembers the privacy information of the training dataset, the larger the generalization gap is. The generalization gap strongly connects with membership inference attacks [109].
Confidence Gap [110] calculates the difference in average confidence between the victim model's training dataset and test dataset. Similar to the generalization gap, the confidence gap positively relates to the extent that the surrogate model remembers training data.
Confidence-Attack Accuracy [69], [3] represents the membership classification accuracy based on the output confidence. The attack algorithm uses model posterior as input to infer data membership information. We use the "Black-Box/Shadow'' implementation of ML-DOCTOR [2], [3].
Gradient-Attack Accuracy [70] represents the white-box membership attack accuracy based on the internal gradients. This attack uses gradient information and loss value to predict data membership. We use the "White-Box/Shadow'' attack implementation of ML-DOCTOR [2], [3].
B. TSDP Solution Evaluation
This part includes the evaluation results of representative defense schemes in Sec. III-E. Table XIV to Table XVIII reports the results of the other five metrics that are not reported in the main paper; our overall findings at this step are consistent with the findings and lessons summarized in the main paper. Note that the confidence gap and generalization gap of our approach and random-guess are 0% because, for a surrogate model that never sees the victim model's training data, the scores are the same between the training and testing dataset.