Recent studies have highlighted audio adversarial examples as a ubiquitous threat to the state-of-the-art automatic speech recognition systems. Thorough studies on how to effectively generate adversarial examples is essential to prevent potential attacks. Despite many research on this, the efficiency and the robustness of existing works are not yet satisfactory. In this paper, we propose weighted-sampling audio adversarial examples, focusing on the numbers and the weights of distortion to reinforce the attack. Further we apply a denoising method in the loss function to make the adversarial attack more imperceptible. Experiments show that our method is the first in the field to generate audio adversarial examples with low noise and high audio robustness at the minute time-consuming level.
1. Samples generated by our adversarial attack method
[Transcription]: “he had surprised himself with the thought”
[Transcription]: “it's late in the evening a perfect time for coffee”
[Transcription]: “it was dropping off in flakes and raining down on the sand”
[Transcription]: “if i had told you you wouldn't have seen the pyramids”
[Transcription]: “most meteorites are more or less rounded”
[Transcription]: “isn't the party also to announce his engagement to joanna”
[Transcription]: “you'll take fifty and like it”
[Transcription]: “he thought of all the married shepherds he had known”
2. Evaluation of robustness
Original Audio
EOT-based (∆=30)
SPT-based (prop.=75%)
SPT-EOT-based(75%,30)
[Transcription]: “only a minority of literature is written this way”
[Transcription]: “they were men of the desert and they were fearful of sorcerers”
[Transcription]: “they were men of the desert and they were fearful of sorcerers”
[Transcription]: “they were men of the desert and they were fearful of sorcerers”
Rob. ↑
WER ↓
1
0.0
1
0.0
1
0.0
Rob. ↑
WER ↓
0
0.2
0
0.4
1
0.0
Rob. ↑
WER ↓
0
0.4
0
0.4
1
1
3. Evaluation of different loss functions
[Transcription]: “the boy reminded the old man that he had said something about hidden treasure”
loss_0
loss_1
loss_2
loss_3
[Transcription]: “isn't the party also to announce his engagement to joanna”