Non-speech acoustic event detection and classification models for audio processing
Tuomas Virtanen and Jort F. Gemmeke
The research in audio signal processing has been dominated by speech research, but most of the sounds in our real-life environments are actually non-speech events such as cars passing by, wind, warning beeps, and animal sounds. These acoustic events contain much information about the environment and physical events that take place in it, enabling novel application areas such as safety, health monitoring and investigation of biodiversity. But while recent years have seen wide-spread adoption of applications such as speech recognition and song recognition, generic computer audition is still in its infancy.
Non-speech acoustic events have several fundamental differences to speech, but many of the core algorithms used by speech researchers can be leveraged for generic audio analysis. The tutorial is a comprehensive review of the field of acoustic event detection as it currently stands. The goal of the tutorial is foster interest in the community, highlight the challenges and opportunities and provide a starting point for new researchers. We will discuss what acoustic event detection entails, the commonalities differences with speech processing, such as the large variation in sounds and the possible overlap with other sounds. We will then discuss basic experimental and algorithm design, including descriptions of available databases and machine learning methods. We will then discuss more advanced topics such as methods to deal with temporally overlapping sounds and modelling the relations between sounds. We will finish with a discussion of avenues for future research.
Last updated: 13-09-2014
Download: [pdf]
Jort F. Gemmeke and Emre Yilmaz
It is well known that some types of calculations parallelize well on modern hardware such as Graphical Processing Units (GPUs). In this tutorial, we will focus on calculations carried out in Matlab, and explain how large speedups can be obtained with minimal code changes.
We discuss these topics:
1) Hardware: which GPUs can we use, experiences at ESAT-PSI, KU Leuven
2) Software: which Matlab toolboxes are available, how to inspect GPU usage
3) Step-by-step examples of simple programs
4) How to measure speedups
5) More advanced topics: parallel for loops, code compilation
6) Tips & tricks: common pitfalls, alternatives
Last updated: 08-04-2013
Download: [pdf]
Jort F. Gemmeke and Tuomas Virtanen
Natural sounds in real-world environments are typically composed of multiple source signals, such as speech and noise, or multiple instruments in a music signal. Moreover, each of these source signals may be composed of parts, for example multiple notes played by a musical instrument. Compositional models, including those based on non-negative matrix factorization (NMF), explicitly consider the fact that sound components largely combine constructively in the composition of more complex sounds. The use of compositional models has yielded state-of-the-art results in many audio processing tasks, such as sound source separation, music content analysis and noise-robust automatic speech recognition. These methods are also closely related to the sparse representations popularized in Compressed Sensing.
In this tutorial we discuss both basics concepts, such as feature representations, dictionary learning and algorithms, as well as more advanced topics such as regularisation with sparsity, probabilistic formulations such as latent variable models, convolutive models and tensor factorization models. From the start, every topic will illustrated with representative application examples, ranging from automatic music transcription to noise-robust automatic speech recognition.
Last updated: 06-09-2013