Audio Data is provided by the San Diego Zoo Research Center
Audio collected in Peru in 2019.
30-60 minutes of audio per audiomoth from 5 different audiomoths, split into 1-minute intervals.
Roughly two weeks of audio data per deployment (approximately 30 deployments) where the devices were recording for a minute every 10 minutes.
The audio recordings capture vocalizations of birds, insects, bats, and other wildlife at various times of the day.
Audio Data was preprocessed using the animal2vec model by Dr. Julian Schäfer-Zimmermann.
The model was trained using the Xeno-Canto bird dataset (https://xeno-canto.org/).
More on the approach: https://arxiv.org/abs/2406.01253.
Audio files were transformed into vector embeddings using animal2vec transformer model. The input audio of one minute at 384 kHz is embedded into 200 Hz (5ms per embedding).
Embeddings are high-dimensional vector representations of audio data.
The goal is to train the model using Peruvian bird data for better performance.