We further explored the impact of adversarial samples generated by FCGHUNTER on enhancing model robustness. Specifically, we enhanced the original training set (i.e., Dataset in Section Ⅶ A) with 90 adversarial samples and tested the improved model's resilience using an independent set of 60 adversarial samples, aiming to verify the mitigation of previously identified vulnerabilities. During the retraining process, we varied the number of adversarial samples in the training set from 10 to 90 to observe changes in the ASR from the test samples.
As shown in the following Figure, as adversarial samples are introduced into the training set, there is a sharp decline in ASR for all features. This decrease is most pronounced within the first 10 samples added, indicating that retraining with even a small number of adversarial samples significantly enhances model robustness.
However, different features show different levels of resilience. For example, features like MaMadroid show higher or more erratic ASR values even with increased adversarial training, which may indicate the model is susceptible to overfitting and forgetting previously correct patterns due to MaMadroid's family features being quite low-dimensional (i.e., 121 dimensions). Similarly, the issue also appears in APIGraph, although it is less severe than in MaMadroid.
In contrast, features like MalScan might be more stable because they capture a broad and complex set of behaviors or characteristics, making them less prone to drastic shifts in performance with the introduction of adversarial examples.
Thus, during retraining, it is crucial to choose a suitable ratio of adversarial samples for different feature types to mitigate this issue.
In summary, the adversarial examples generated by FCGHUNTER could be effectively used to enhance the resilience of ML-based models.
Figure: Attack success rate after retraining