Automatic Source Separation in Avian Duets and Countersongs using Deep Neural Networks
Dyadic acoustic interactions such as duetting and countersinging are used by avians in different behavioral contexts, including territorial defense, mate attraction, and pair bonding. Studying these interactions is challenging, as it requires distinguishing the vocalizations of each individual in a recording, a classic problem in signal processing known as blind source separation. Despite its potential for the study of animal behavior, blind source separation is rarely addressed in a bioacoustic context, focusing mostly on human speech and anthropogenic sound sources. Here, we propose a convolutional neural network architecture for the automatic separation of avian vocalizations in duets and countersongs. Our architecture processes mixed vocalizations in a recording (e.g., a duet) and automatically isolates the sounds of each individual from the mixture. Our approach was tested using vocalizations of three species with different vocal complexity: The great spotted kiwi (Roroa) Apteryx maxima, The gray warbler (Riroriro) Gerygone igata, and The North Island kokako (Kōkako) Callaeas wilsoni. Our approach significantly facilitates the study of vocal behavior and social interactions in birds, which is time-consuming due to the large amount of manual work required for the signal preprocessing. This includes, but is not limited to, the study of rhythm, duetting and countersinging behavior, and cultural transmission.