Sound event synthesis using WaveNet
This is a demonstration of sound event synthesis (SES) using event labels based on the conditional WaveNet [1]. As the dataset, we used 10 different sound events (manual coffee grinder, cup clinking, alarm clock ringing, whistle, maracas, drum, electric shaver, trash box banging, tearing paper, bell ringing) contained in the RWCP-SSD (Real World Computing Partnership-Sound Scene Database) [2].
You can download a zip file of original and synthesized sounds from here.
・Manual coffee grinder
![](https://www.google.com/images/icons/product/drive-32.png)
Original sound
![](https://www.google.com/images/icons/product/drive-32.png)
Synthesized sound
・Cup
![](https://www.google.com/images/icons/product/drive-32.png)
Original sound
![](https://www.google.com/images/icons/product/drive-32.png)
Synthesized sound
・Clock
![](https://www.google.com/images/icons/product/drive-32.png)
Original sound
![](https://www.google.com/images/icons/product/drive-32.png)
Synthesized sound
・Whistle
![](https://www.google.com/images/icons/product/drive-32.png)
Original sound
![](https://www.google.com/images/icons/product/drive-32.png)
Synthesized sound
・Maracas
![](https://www.google.com/images/icons/product/drive-32.png)
Original sound
![](https://www.google.com/images/icons/product/drive-32.png)
Synthesized sound
・Drum
![](https://www.google.com/images/icons/product/drive-32.png)
Original sound
![](https://www.google.com/images/icons/product/drive-32.png)
Synthesized sound
・Shaver
![](https://www.google.com/images/icons/product/drive-32.png)
Original sound
![](https://www.google.com/images/icons/product/drive-32.png)
Synthesized sound
・Trash box
![](https://www.google.com/images/icons/product/drive-32.png)
Original sound
![](https://www.google.com/images/icons/product/drive-32.png)
Synthesized sound
・Tearing paper
![](https://www.google.com/images/icons/product/drive-32.png)
Original sound
![](https://www.google.com/images/icons/product/drive-32.png)
Synthesized sound
・Bell
![](https://www.google.com/images/icons/product/drive-32.png)
Original sound
![](https://www.google.com/images/icons/product/drive-32.png)
Synthesized sound
[1] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “WaveNet: A Generative Model for Raw Audio,” arXiv preprint, arXiv:1609.03499, 2016.[2] S. Nakamura, K. Hiyane, F. Asano, and T. Endo, “Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-free Speech Recognition,” Proc. LanguageResources and Evaluation Conference (LREC), pp. 965–968, 2000.