Hear "No Evil", See "Kenansville":
Efficient and Transferable
Black-Box Attacks on Automatic Speech Recognition Systems
Our attack perturbs the audio samples so that they are correctly understood by humans but incorrectly transcribed by Automatic Speech Recognition (ASR) Systems.
Reviewers can most easily test these samples against the Google Speech API to confirm that these samples indeed work.
To cite our work:
@INPROCEEDINGS{abdullah2019hear,
title={{Hear ``no evil'', see ``Kenansville'': Efficient and Transferable Black-box Attacks on Speech Recognition and Voice Identification Systems}},
author={\textbf{Abdullah}, Hadi and Rahman, Muhammad Sajidur and Garcia, Washington and Blue, Logan and Warren, Kevin and Yadav, Anurag Swarnim and Shrimpton, Tom and Traynor, Patrick},
booktitle={IEEE Symposium on Security and Privacy (IEEE S\&P)},
year={2021}
}
Original Audio:
Transcription: don't ask me to carry an oily rag like that
Perturbed Audio:
Transcription: don't ask me cherimoya drag attack
Original Audio:
Transcription: the emperor had a mean temper
Perturbed Audio:
Transcription: syempre Hanuman Temple
Original Audio:
Transcription: then the choreographer must arbitrate
Perturbed Audio:
Transcription: Democrat ographer messed arbitrate