Hear "No Evil", See "Kenansville":

Efficient and Transferable

Black-Box Attacks on Automatic Speech Recognition Systems


Our attack perturbs the audio samples so that they are correctly understood by humans but incorrectly transcribed by Automatic Speech Recognition (ASR) Systems.

Reviewers can most easily test these samples against the Google Speech API to confirm that these samples indeed work.

To cite our work:

@INPROCEEDINGS{abdullah2019hear,

title={{Hear ``no evil'', see ``Kenansville'': Efficient and Transferable Black-box Attacks on Speech Recognition and Voice Identification Systems}},

author={\textbf{Abdullah}, Hadi and Rahman, Muhammad Sajidur and Garcia, Washington and Blue, Logan and Warren, Kevin and Yadav, Anurag Swarnim and Shrimpton, Tom and Traynor, Patrick},

booktitle={IEEE Symposium on Security and Privacy (IEEE S\&P)},

year={2021}

}

Original Audio:

google_1_og.wav

Transcription: don't ask me to carry an oily rag like that

Perturbed Audio:

google_1_atk.wav

Transcription: don't ask me cherimoya drag attack

Original Audio:

google_3_og.wav

Transcription: the emperor had a mean temper

Perturbed Audio:

google_3_atk.wav

Transcription: syempre Hanuman Temple

Original Audio:

google_2_og.wav

Transcription: then the choreographer must arbitrate

Perturbed Audio:

google_2_atk.wav

Transcription: Democrat ographer messed arbitrate