Legend:
Original Audio: Normal noisy input to the model, x = r∗ (y + b)
Target Audio: attacker's desired output from the model: y'
Attacked Audio: Original audio plus a perturbation: x + δ
Attacked Model Output: f (x + δ)
Original Audio
Target Audio
Attacked Audio
Model Output
Original Audio
Target Audio
Attacked Audio
Model Output
Original Audio
Target Audio
Attacked Audio
Model Output
Original Audio
Target Audio
Attacked Audio
Model Output