Dataset for Environmental Sound Synthesis
ESC-50-Voice [2]
Impression caption dataset for environmental sound [4] (coming soon. . .)
RELATE [5]
Dataset for Environmental Sound Extraction
[1] Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, and Yoichi Yamashita, "RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis," Proc. Detection and Classification of Acoustic Scenes and Events (DCASE), pp. 125-129, 2020.
[2] Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryotaro Nagase, Takahiro Fukumori, Yoichi Yamashita, "Environmental Sound Synthesis from Vocal Imitations and Sound Event Labels," Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 411-415, 2024.
[3] Yuki Okamoto, Kanta Shimonishi, Keisuke Imoto, Kota Dohi, Shota Horiguchi, and Yohei Kawaguchi, "CAPTDURE: Captioned sound Dataset of Single Sources," Proc. INTERSPEECH, pp. 1683-1687, 2023.
[4] Yuki Okamoto, Ryotaro Nagase, Minami Okamoto, Yuki Saito, Keisuke Imoto, Takahiro Fukumori, and Yoichi Yamashita, "Construction and Analysis of Impression Caption Dataset for Environmental Sounds, arXiv preprint, arXiv:2410.15532, 2024.
[5] Yusuke Kanamori, Yuki Okamoto, Taisei Takano, Shinnosuke Takamichi, Yuki Saito, and Hiroshi Saruwatari, "RELATE: Subjective Evaluation Dataset for Automatic Evaluation of Relevance Between Text and Audio," Proc. INTERSPEECH, pp. 3155-3159, 2025.