...a complex audio signal (S) includes an intended audio signal (S1) and at least one interfering audio signal (S2). The complex audio signal (S) is converted into text (F) which represents a plurality of words included in the complex audio signal (S), and at least some of the text is identified as representing words which correspond to the at least one interfering audio signal (S2).