環境音セグメンテーション / Environmental Sound Segmentation

深層学習を用いた音源定位、音源分離、クラス分類の統合アプローチ

〜環境音セグメンテーションの提案〜（2018年〜2021年）

Environmental Sound Segmentation - Integration of Sound Source Localization, Separation, and Classification -

従来個別に評価・開発が行われてきたフレームワークに対し、以下の図のように、これらのブロックを統合して扱う「環境音をセグメンテーション」というタスクを定義し、研究を行っております。

提案した統合フレームワークである「環境音をセグメンテーション」手法は、誤差の蓄積を防ぐことができ、また、従来の深層学習モデルよりも高精度に環境認識を行うことができます。

Unlike conventional framework consists of individual function blocks that have been evaluated and developed separately, we have defined and studied a task called "Environmental Sound Segmentation" that deals with these blocks in an integrated manner.

The proposed integrated framework, the "Environmental Sound Segmentation" method, can prevent the accumulation of errors and can also perform environmental recognition with higher accuracy than conventional deep learning models.

www.slideshare.net/ssuser49a7fe/ss-249462791

ジャーナル / Journal

国際学会 / Peer reviewed conference paper

Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, “Improvement of DOA estimation by using quaternion output in sound event localization and detection”, Workshop on Detection and Classification of Acoustic Scenes and Events, 2019, pp. 244-247.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, “Environmental sound segmentation utilizing Mask U-Net”, IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019.
M. Iwatsuki, Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, "Listen and Tell: Acoustic Scene Caption Generation using Deep Learning", in The Third International Workshop on Symbolic-Neural Learning (SNL 2019) 2019.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, “Multi-channel Environmental sound segmentation”, IEEE/SICE International Symposium on System Integration, 2020.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, “Multi-channel Environmental sound segmentation utilizing Sound Source Localization and Separation U-Net”, IEEE/SICE International Symposium on System Integration, 2021.

国内学会 / Non reviewed conference paper

Yanke Long, Riku Yasuda, Yui Sudo, Katsutoshi Itoyama, Kazuhiro Nakadai, Hideharu Amano, and Kenji Nishida, “An efficient end-to-end learning method for sound event localization and detection”, 第41回日本ロボット学会学術講演会予稿集 (RSJ 2023), 2023年09月, 日本ロボット学会.
岩月道生,周藤唯,糸山克寿,西田健次,中臺一博, “音環境説明ロボットの実現に向けた環境音キャプションコーパスの構築”, 第37回日本ロボット学会学術講演会予稿集 (RSJ 2019), 2019年09月, 日本ロボット学会, 早稲田大学, https://ac.rsj-web.org/2019/index.html
周藤唯,糸山克寿,西田健次,中臺一博, “Mask U-Net を用いた環境音セグメンテーションの提案”, 第52回人工知能学会 AIチャレンジ研究会予稿集, 2018年12月, 人工知能学会, 早稲田大学, http://www.osaka-kyoiku.ac.jp/~challeng/SIG-Challenge-052/
岩月道生,周藤唯,糸山克寿,西田健次,中臺一博, “Listen and Tell: 深層学習を用いた音響シーンのキャプション生成”, 第81回情報処理学会全国大会講演論文集, 2019年03月, 情報処理学会, 早稲田大学, https://www.ipsj.or.jp/event/taikai/81/ipsj_web2019/

講演 / Tech talk

周藤唯, "深層学習を用いた音源定位、音源分離、クラス分類の統合　〜環境音セグメンテーション手法の紹介〜, " Tokyo BISH Bash #05, online, 2021