Through our research, we explored three important questions to better understand audio recordings of biodiversity activity:
Firstly, we examined how effectively unsupervised clustering methods could group these recordings. We implemented two distinct clustering methods and found that both approaches effectively grouped audio samples based on similar acoustic characteristics. For instance, one notable success was grouping the sounds of bird calls consistently across multiple audio files, clearly demonstrating our clustering methods' ability to capture meaningful acoustic similarities.
Secondly, we visualized acoustic data through spectrograms to gain better understanding about patterns emerging within each cluster. This analysis was particularly insightful, revealing distinct and consistent acoustic signatures for each group. Using streamlit app, we were able to visualize these patterns in spectrograms, and at the same time verify audibly that each cluster was coherent, making our findings more intuitive and robust.
Lastly, we visualized histograms to examine the distribution of clustered vocalizations across different times of the day. These histograms confirmed that each cluster exhibited unique temporal patterns, validating that specific biodiversity activities occur at distinct times.
While our current evaluation relies on unsupervised methods and thus lacks complete accuracy, the observed strong patterns demonstrate significant promise. In future research, we aim to apply our clustering approach to labeled data, allowing for more precise evaluation and refinement of our methodology.
Species identification and classification
Animal Vocalization Research
Animal Behavior Monitoring
Potential for creation of the library of labeled embeddings to promote further research efforts
Use a larger dataset to obtain more embeddings for clustering. (Currently, there are very few recordings from Dawn/Dusk)
Update Animal2Vec model Parameters to include higher frequencies (e.g. for bat vocalizations)
Potential to incorporate a feature that allows users to label sounds or clusters in real time, further enhancing the analysis.