This demo is for the submitted paper to Interspeech2017:
Attention and Localization based on Convolutional Recurrent Model for Semi-supervised Audio Tagging
Source code here: https://github.com/yongxuUSTC/att_loc_cgrnn
Example 1: attention and localization [wav]
The logarithmic spectrogram denoted as (i), the predicted localization results denoted as (ii) and the attention factor denoted as (iii) for an audio chunk labeled as “child speech (c)” and “percussive sound (p)”. The X-axis of the three figures are all in the same frame index. The wav can be listened here: https://drive.google.com/open?id=0B5r5bvRpQ5DRZzIyYWctczhVUDA
Example 2: attention [wav]
This attention example shows that the attention scheme can well ignore the background noise while keep the related information. The wav can be listened here: https://drive.google.com/file/d/0B5r5bvRpQ5DRdERtTWFodlZMNzg/view?usp=sharing