Tomoki Toda

研究

発表リスト

Google Scholar Citations

代表的な文献

- 音声信号処理、音声波形モデリング、音声変換、テキスト音声合成、音声言語処理
- 音声認識、音声表情認識、音声対話、音声翻訳、動画シーン検索
- 体内伝導音声処理、調音・音響間マッピング、無喉頭音声処理、構音障害音声処理
- 音声品質推定、主観評価法、詐称音声検知、安全・プライバシー保護
- 歌声変換、歌声合成、歌唱支援、自動採譜、楽曲分析
- 音源分離・目的音強調、多チャンネル信号処理、音響イベント認識、音響イベント・シーン記述、異常音検知

- 音声信号処理
  - S. Chen, T. Toda. QHARMA-GAN: quasi-harmonic neural vocoder based on autoregressive moving average model. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 3703-3719, Sep. 2025. [Open Access]
  - S. Chen, T. Toda. Sequence-wise speech waveform modeling via gradient descent optimization of quasi-harmonic parameters. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 319-332, Jan. 2025. [Open Access]
  - A. Miyashita, T. Toda. Differentiable representation of warping based on Lie group theory. Proc. IEEE WASPAA, 5 pages, Oct. 2023. [Preprint]
  - A. Miyashita, T. Toda. Representation of vocal tract length transformation based on group theory. Proc. IEEE ICASSP, 5 pages, June 2023. [Preprint]
  - T. Toda, K. Tokuda. Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM. Proc. IEEE ICASSP, pp. 3925-3928, Apr. 2008. [Preprint]
- 音声波形モデリング
  - R. Yoneyama, A. Miyashita, R. Yamamoto, T. Toda. Wavehax: aliasing-free neural waveform synthesis based on 2D convolution and harmonic prior for reliable complex spectrogram estimation. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33 , pp. 4454-4470, Oct. 2025. [Open Access]
  - R. Yoneyama, T. Toda. SiFi-GAN: combining source-filter modeling and upsampling-based high-fidelity neural vocoder for fast and pitch-controllable speech synthesis. IEICE Transactions on Information and Systems, Vol. E109-D, No. 6, pp. 945-956, June 2026. [Open Access]
  - R. Yoneyama, Y.-C. Wu, T. Toda. High-fidelity and pitch-controllable neural vocoder based on unified source-filter networks. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 3717-3729, Oct. 2023. [Open Access]
  - Y.-C. Wu, T. Hayashi, T. Okamoto, H. Kawai, T. Toda. Quasi-periodic parallel WaveGAN: a non-autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 792-806, Feb. 2021. [Open Access]
  - 戸田智基. 機械学習と音声生成：統計的手法に基づく音声信号モデリング. 計測自動制御学会（編）機械学習の可能性 , コロナ社, Dec. 2022. [Open Access]
- 音声変換
  - C. Xie, T. Toda. Noisy-to-noisy voice conversion under variations of noisy condition. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 3871-3882, Oct. 2023. [Open Access]
  - W.-C. Huang, S.-W. Yang, T. Hayashi, T. Toda. A comparative study of self-supervised speech representation based voice conversion. IEEE Journal of Selected Topics in signal Processing, Vol. 16, No. 6, pp. 1308-1318, Oct. 2022. [Preprint]
  - W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, T. Toda. Pretraining techniques for sequence-to-sequence voice conversion. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 745-755, Feb. 2021. [Open Access]
  - T. Toda, L.-H. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, J. Yamagishi. The Voice Conversion Challenge 2016. Proc. INTERSPEECH, pp. 1632-1636, Sep. 2016. [Open Access]
  - 戸田智基. はじめての音声変換. 日本音響学会誌, Vol. 72, No. 6, pp. 324-331, June 2016. [Open Access]
  - 戸田智基. 確率モデルに基づく声質変換技術. 日本音響学会誌, Vol. 67, No. 1, pp. 34-39, Jan. 2011. [Open Access]
  - T. Toda, A.W. Black, K. Tokuda. Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 8, pp. 2222-2235, Nov. 2007. [Preprint]
- テキスト音声合成
  - D. Yoshioka, Y. Nakata, Y. Yasuda, T. Toda. Text- and speech-style control for lecture speech generation focusing on disfluency. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e26, pp. 1-31, Sep. 2025. [Open Access]
  - T. Hayashi, R. Yamamoto, K. Inoue, T. Yoshimura, S. Watanabe, T. Toda, K. Takeda, Y. Zhang, X. Tan. ESPNET-TTS: Uunified, reproducible, and integratable open source end-to-end text-to-speech toolkit. Proc. IEEE ICASSP, pp. 7654-7658, May 2020. [Preprint]
  - S. Takamichi, T. Toda, A.W. Black, G. Neubig, S. Sakti, S. Nakamura. Post-filters to modify the modulation spectrum for statistical parametric speech synthesis. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No. 4, pp. 755-767, Apr. 2016. [Preprint]
  - K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, K. Oura. Speech synthesis based on hidden Markov models. Proceedings of the IEEE, Vol. 101, No. 5, pp. 1234-1252, May 2013. [Preprint]
  - T. Toda, K. Tokuda. A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions, Vol. E90-D, No. 5, pp. 816-824, May 2007. [Preprint]
  - 河井恒, 戸田智基, 山岸順一, 平井俊男, 倪晋富, 西澤信行, 津崎実, 徳田恵一. 大規模コーパスを用いた音声合成システムXIMERA. 電子情報通信学会論文誌，Vol. J89-D-II, No. 12, pp. 2688-2698, Dec. 2006. [Preprint]
- 音声言語処理
  - D. Yoshioka, Y. Yasuda, T. Toda. Nonparallel spoken-text-style transfer for linguistic expression control in speech generation. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 333-346, Jan. 2025. [Open Access]
  - Y. Yasuda, T. Toda. Investigation of Japanese Png BERT language model in text-to-speech synthesis for pitch accent language. IEEE Journal of Selected Topics in signal Processing, Vol. 16, No. 6, pp. 1319-1328, Oct. 2022. [Open Access]
  - T. Hayashi, S. Watanabe, T. Toda, K. Takeda, S. Toshniwal, K. Livescu. Pre-trained text embeddings for enhanced text-to-speech synthesis. Proc. INTERSPEECH, pp. 4430-4434, Sep. 2019. [Open Access]

- 音声認識
  - J. He, N. Sawada, K. Miyazaki, T. Toda. CMT-LLM: context-aware multi-talker ASR utilizing large language models. Proc. INTERSPEECH, pp. 2575-2579, Rotterdam, the Netherlands, Aug. 2025. [Open Access]
  - J. He, N. Sawada, K. Miyazaki, T. Toda. PARCO: phoneme-augmented robust contextual ASR via contrastive entity disambiguation. Proc. IEEE ASRU, 7 pages, Dec. 2025. [Preprint]
  - J. He, T. Toda. PMF-CEC: phoneme-augmented multimodal fusion for context-aware ASR error correction with error-specific selective decoding. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 2402-2417, June 2025. [Open Access]
  - T. Hayashi, S. Watanabe, Y. Zhang, T. Toda, T. Hori, R. Astudillo, K. Takeda. Back-translation-style data augmentation for end-to-end ASR. Proc. IEEE SLT, pp. 426-433, Dec. 2018. [Preprint]
- 音声表情認識
  - J. He, X. Shi, C.-H. Hu, J. Mi, X. Li, T. Toda. M4SER: multimodal, multirepresentation, multitask, and multistrategy learning for speech emotion recognition. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 4055-4070, Sep. 2025. [Open Access]
  - J. He, J. Mi, T. Toda. GIA-MIC: multimodal emotion recognition with gated interactive attention and modality-invariant learning constraints. Proc. INTERSPEECH, pp. 2695-2699, Rotterdam, the Netherlands, Aug. 2025. [Open Access]
  - X. Shi, J. He, X. Li, T. Toda. A comprehensive study on the effectiveness of ASR representations for noise-robust speech emotion recognition. IEEE Transactions on Audio, Speech and Language Processing, Vol. 34 , pp. 707-722, Jan. 2026. [Preprint]
  - X. Shi, X, Li, T. Toda. Who, When, and What: leveraging the "Three Ws" concept for emotion recognition in conversation. Proc. INTERSPEECH, pp. 1763-1767, Rotterdam, the Netherlands, Aug. 2025. [Open Access]
  - X. Shi, X. Li, T. Toda. Emotion awareness in multi-utterance turn for improving emotion prediction in multi-speaker conversation. Proc. INTERSPEECH, pp. 765-769, Aug. 2023. [Open Access]
  - A. Ando, T. Mori, S. Kobashikawa, T. Toda. Speech emotion recognition based on listener-dependent emotion perception models. APSIPA Transactions on Signal and Information Processing, Vol. 10, e6, pp. 1-11, Apr. 2021. [Open Access]
- 音声対話
  - T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Learning cooperative persuasive dialogue policies using framing. Speech Communication, Vol. 84, pp. 83-96, Nov. 2016. [Preprint]
  - 山内祐輝, Graham Neubig, Sakriani Sakti, 戸田智基, 中村哲. 対話システムにおける用語間の関係性を用いた話題誘導応答文生成. 人工知能学会論文誌, Vol. 29, No. 1, pp. 80-89, Jan. 2014. [Open Access]
- 音声翻訳
  - Q. Truong Do, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Preserving word-level emphasis in speech-to-speech translation. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 25, No. 3, pp. 544-556, Mar. 2017. [Preprint]
  - Y. Oda, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Optimizing segmentation strategies for simultaneous speech translation. Proc. ACL, pp. 551-556, June 2014. [Open Access]
- 動画シーン検索
  - J. He, T. Toda. 2DP-2MRC: 2-dimensional pointer-based machine reading comprehension method for multimodal moment retrieval. Proc. INTERSPEECH, pp. 5073-5077, Kos Island, Greece, Sep. 2024. [Open Access]

- 体内伝導音声処理
  - S. Seki, M. Takada, T. Toda. Semi-supervised self-produced speech enhancement and suppression based on joint source modeling of air- and body-conducted signals using variational autoencoder. Proc. INTERSPEECH, pp. 4039-4043, Oct. 2020. [Open Access]
  - Y. Tajiri, T. Toda. Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring. Proc. 9th ISCA Speech Synthesis Workshop (SSW9), pp. 54-60, Sep. 2016. [Open Access]
  - T. Toda, M. Nakagiri, K. Shikano. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, Vol. 20, No. 9, pp. 2505-2517, Sep. 2012. [Preprint]
  - T. Toda, K. Nakamura, T. Nagai, T. Kaino, Y. Nakajima, K. Shikano. Technologies for processing body-conducted speech detected with non-audible murmur microphone. Proc. INTERSPEECH, pp. 632-635, Sep. 2009. [Open Access]
- 調音・音響間マッピング
  - P.L. Tobing, K. Kobayashi, T. Toda. Articulatory controllable speech modification based on statistical inversion and production mappings. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 12, pp. 2337-2350, Dec. 2017. [Preprint]
  - T. Toda, A.W. Black, K. Tokuda. Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Communication, Vol. 50, No. 3, pp. 215-227, Mar. 2008. [Preprint]
- 無喉頭音声処理
  - 西尾直樹, 小林和弘, 戸田智基. 喉頭摘出者における自己音声の再獲得～Save the Voice Project～. 気管食道科学会会報, Vol. 76, No. 5, pp. 255-263, Oct. 2025. [Link]
  - L.P. Violeta, W.-C. Huang, D. Ma, R. Yamamoto, K. Kobayashi, T. Toda. Resolving domain mismatches in electrolaryngeal speech enhancement with linguistic intermediates. IEEE Journal of Selected Topics in signal Processing,Vol. 19, No. 5, pp. 827-839, June 2025. [Open Access]
  - D. Ma, L.P. Violeta, K. Kobayashi, T. Toda. Pretraining and fine-tuning techniques for electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 3189-3201, July 2025. [Open Access]
  - L.P. Violeta, D. Ma, W.-C. Huang, T. Toda. Pretraining and adaptation techniques for electrolaryngeal speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 2777-2789, May 2024. [Open Access]
  - K. Kobayashi, T. Toda. Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN. Proc. EUSIPCO, pp. 396-400, Aug. 2020. [Open Access]
  - H. Doi, T. Toda, K. Nakamura, H. Saruwatari, K. Shikano. Alaryngeal speech enhancement based on one-to-many eigenvoice conversion. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 22, No. 1, pp. 172-183, Jan. 2014. [Preprint]
  - K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Communication, Vol. 54, No. 1, pp. 134-146, Jan. 2012. [Link]
- 構音障害音声処理
  - B.M. Halpern, T.B. Tienkamp, T. Rebernik, R.J.J.H. van Son, S.A.H.J. de Visscher, M.J.H. Witjes, D. Abur, T. Toda. XPPG-PCA: reference-free automatic speech severity evaluation with principal components. IEEE Journal of Selected Topics in signal Processing, Vol. 19, No. 5, pp. 783-795, Oct. 2025. [Preprint]
  - B.M. Halpern, W.-C. Huang, L.P. Violeta, T. Toda. Severity-controllable pathological text-to-speech synthesis for clinical applications," IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 34, pp. 573-582, Jan. 2026. [Open Access]
  - L.P. Violeta, W.-C. Huang, T. Toda. Investigating self-supervised pretraining frameworks for pathological speech recognition. Proc. INTERSPEECH, pp. 41-45, Incheon, Korea, Sep. 2022. [Open Access]
  - W.-C. Huang, K. Kobayashi, Y.-H. Peng, C.-F. Liu, Y. Tsao, H.-M. Wang, T. Toda. A preliminary study of a two-stage paradigm for preserving speaker identity in dysarthric voice conversion. Proc. INTERSPEECH, pp. 1329-1333, Aug.-Sep. 2021. [Open Access]

- 音声品質推定
  - C.-H. Hu, Y. Yasuda, T. Toda. E2EPref: an end-to-end preference-based framework for speech quality assessment to alleviate bias in direct assessment scores. Computer Speech and Language, Vol. 93, Article 101799, pp. 1-17, Mar. 2025. [Open Access]
  - E. Cooper, W.-C. Huang, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. A review on subjective and objective evaluation of synthetic speech. Acoustical Science and Technology，Vol. 45, No. 4, pp. 161-183, July 2024. [Open Access]
  - W.-C. Huang, E. Cooper, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. The VoiceMOS Challenge 2022. Proc. INTERSPEECH, pp. 4536-4540, Incheon, Korea, Sep. 2022. [Open Access]
  - W.-C. Huang, E. Cooper, J. Yamagishi, T. Toda. LDNet: unified listener dependent modeling in MOS prediction for synthetic speech. Proc. IEEE ICASSP, pp. 896-900, May 2022. [Preprint]
  - E. Cooper, W.-C. Huang, T. Toda, J. Yamagishi. Generalization ability of MOS prediction networks. Proc. IEEE ICASSP, pp. 8442-8446, May 2022. [Preprint]
- 主観評価法
  - Y. Yasuda, T. Toda. Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment. Computer Speech and Language, Vol. 96, Article 101888, pp. 1-16, Sep. 2025. [Open Access]
  - 安田裕介, 戸田智基. 音声のMOS評価法の限界と大規模比較評価の新しい可能性. 日本音響学会誌, Vol. 80, No. 7, pp. 393-400, Aug. 2024. [Open Access]
- 詐称音声検知
  - Y. Zang, J. Shi, Y. Zhang, R. Yamamoto, J. Han, Y. Tang, S. Xu, W. Zhao, J. Guo, T. Toda, Z. Duan CtrSVDD: a benchmark dataset and baseline analysis for controlled singing voice deepfake detection. Proc. INTERSPEECH, pp. 4783-4787, Kos Island, Greece, Sep. 2024. [Open Access]
  - X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V. Vestman, T. Kinnunen, K.A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. Wang, S. Le Maguer, M. Becker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Onuma, K. Mushika, T. Kaneda, Y. Jiang, L.-J. Liu, Y.-C. Wu, W.-C. Huang, T. Toda, K. Tanaka, H. Kameoka, I. Steiner, D. Matrouf, J.-F. Bonastre, A. Govender, S. Ronanki, J.-X. Zhang, Z.-H. Ling. ASVspoof 2019: a large-scale public database of synthetic, converted and replayed speech. Computer Speech and Language, Vol. 64, Article 101114, 25 pages, Nov. 2020. [Preprint]
  - T. Kinnunen, J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, Z. Ling. A spoofing benchmark for the 2018 voice conversion challenge: leveraging from spoofing countermeasures for speech artifact assessment. Proc. Odyssey 2018, pp. 187-194, June 2018. [Open Access]
  - Z. Wu, P. De Leon, C. Demiroglu, A. Khodabakhsh, S. King, Z.-H. Ling, D. Saito, B. Stewart, T. Toda, M. Wester, J. Yamagishi. Anti-spoofing for text-independent speaker verification: an initial database, comparison of countermeasures, and human performance. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No. 4, pp. 768-783, Apr. 2016. [Preprint]
- 安全・プライバシー保護
  - L. Chen, K.A. Lee, Z.-H. Ling, X. Wang, R.K. Das, T. Toda, H. Li. Speaker privacy and security in the big data era: protection and defense against deepfake. Proc. APSIPA ASC, Perspective paper, pp. 2570-2575, Singapore, Oct. 2025. [Open Access]
  - S. Tang, Z. Liu, L. Chen, K. Lee, T. Toda, Z.-H. Ling. A preliminary study on sectional voice anonymization and detection. Proc. APSIPA ASC, pp. 2229-2234, Singapore, Oct. 2025. [Open Access]
  - D. Yoon, T. Toda. Neural semi-fragile watermarking for proactive deepfake speech detection. Proc. APSIPA ASC, pp. 2396-2401, Singapore, Oct. 2025. [Open Access]
  - W.-C. Huang, Y.-C. Wu, T. Toda. Multi-speaker text-to-speech training with speaker anonymized data. IEEE Signal Processing Letters, Vol. 31, pp. 2995-2999, Oct. 2024. [Open Access]

- 歌声変換
  - W.-C. Huang, L.P. Violeta, S. Liu, J. Shi, T. Toda. The Singing Voice Conversion Challenge 2023. Proc. IEEE ASRU, 8 pages, Dec. 2023. [Preprint]
  - R. Yamamoto, R. Yoneyama, L.P. Violeta, W.-C. Huang, T. Toda. A comparative study of voice conversion models with large-scale speech and singing data: the T13 systems for the Singing Voice Conversion Challenge 2023. Proc. IEEE ASRU, 6 pages, Taipei, Taiwan, Dec. 2023. [Preprint]
  - K. Kobayashi, T. Toda, S. Nakamura. Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential. Speech Communication, Vol. 99, pp. 211-220, May 2018. [Open Access]
  - K. Kobayashi, T. Toda, H. Doi, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. Voice timbre control based on perceived age in singing voice conversion. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1419-1428, June 2014. [Open Access]
  - H. Doi, T. Toda, T. Nakano, M. Goto, S. Nakamura. Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system. Proc. APSIPA ASC, Nov. 2012. [Open Access]
- 歌声合成
  - K. Nishizawa, R. Yamamoto, W.-C. Huang, T. Toda. Investigating factors related to the naturalness of synthesized unison singing. Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025. [Preprint]
  - R. Yamamoto, R. Yoneyama, T. Toda. NNSVS: a neural network based singing voice synthesis toolkit. Proc. IEEE ICASSP, 5 pages, June 2023. [Preprint]
- 歌唱支援
  - L. Li, T. Toda, K. Morikawa, K. Kobayashi, S. Makino. Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE. Proc. ISMIR, pp. 784-790, Delft, the Netherlands, Nov. 2019. [Open Access]
  - K. Morikawa, T. Toda. Electrolaryngeal speech modification towards singing aid system for laryngectomees. Proc. APSIPA, 4 pages, Dec. 2017. [Open Access]
- 自動採譜
  - J. Mi, S. Kim, T. Toda. Improved architecture for high-resolution piano transcription to efficiently capture acoustic characteristics of music signals. Proc. APSIPA ASC, 6 pages, Macau, China, Dec. 2024. [Preprint]
  - S. Kim, K. Takeda, T. Toda. Sequence-to-sequence network training methods for automatic guitar transcription with tokenized outputs. Proc. ISMIR, pp. 524-531, Nov. 2023. [Open Access]
  - S. Kim, T. Hayashi, T. Toda. Note-level automatic guitar transcription using attention mechanism. Proc. EUSIPCO, pp. 229-233, Aug.-Sep. 2022. [Open Access]
- 楽曲分析
  - Y. Hashizume, T. Toda. Investigation of perceptual music similarity based on individual instrumental parts by large-scale listening test. APSIPA Transactions on Signal and Information Processing, Vol. 15, No. 1, pp. 249-269, Apr. 2026. [Open Access]
  - T. Imamura, Y. Hashizume, W.-C. Huang, T. Toda. Music similarity representation learning focusing on individual instruments with source separation and human preference. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 4, e305, pp. 1-29, Oct. 2025. [Open Access]
  - Y. Hashizume, L. Li, A. Miyashita, T. Toda. Learning separated representations for instrument-based music similarity. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e16, pp. 1-32, July 2025. [Open Access]
  - S. Seki, T. Toda, K. Takeda. Stereophonic music separation based on non-negative tensor factorization with cepstral distance regularization. IEICE Transactions on Fundamentals, Vol. E101-A, No. 7, pp. 1057-1064, July 2018. [Link]

- 音源分離・目的音強調
  - T. Fujimura, T. Toda. Analysis and extension of noisy-target training for unsupervised target signal enhancement. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e12, pp. 1-27, June 2025. [Open Access]
  - R. Wang, T. Fujimura, T. Toda. Target speaker extraction under noisy underdetermined conditions using conditional variational autoencoder, global style token, and neural postfilter. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e2, pp. 1-26, Jan. 2025. [Open Access]
  - R. Wang, L. Li, T. Toda. Dual-channel target speaker extraction based on conditional variational autoencoder and directional information. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 1968-1979, Mar. 2024. [Open Access]
  - S. Seki, H. Kameoka, L. Li, T. Toda, K. Takeda. Underdetermined source separation based on generalized multichannel variational autoencoder. IEEE Access, Vol. 7, No. 1, pp. 168104-168115, Nov. 2019. [Open Access]
- 多チャンネル信号処理
  - S. Luan, Y. Wakabayashi, T. Toda. Generalized sound field interpolation for freely spaced microphone arrays in rotation-robust beamforming. Applied Acoustics, Vol. 236, Article 110706, pp. 1-15, Apr. 2025. [Open Access]
  - S. Luan, Y. Wakabayashi, T. Toda. Unequally spaced sound field interpolation for rotation-robust beamforming. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 3185-3199, June 2024. [Open Access]
  - H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Enhancing event-related potentials based on maximum a posteriori estimation with a spatial correlation prior. IEICE Transactions on Information and Systems, Vol. E99-D, No. 6, pp. 1410-1419, June 2016. [Open Access]
- 音響イベント認識
  - K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Weakly-supervised sound event detection with self-attention. Proc. IEEE ICASSP, pp. 66-70, May 2020. [Link]
  - T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda. Duration-controlled LSTM for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 11, pp. 2059-2070, Nov. 2017. [Open Access]
- 音響イベント・シーン記述
  - T. Komatsu, H. Munakata, Y. Ishikawa, K. Takeda, T. Toda. Semi-supervised text-audio contrastive learning method using pseudo-text input. APSIPA Transactions on Signal and Information Processing, Vol. 15, No. 1, pp. 183-198, Apr. 2026. [Open Access]
  - T. Komatsu, K. Takeda, T. Toda. Audio difference learning framework for audio captioning. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e34, pp. 1-18, Nov. 2025. [Open Access]
  - K. Miyazaki, T. Hayashi, T. Toda, K. Takeda. Connectionist temporal classification-based sound event encoder for converting sound events into onomatopoeia representations. Proc. EUSIPCO, pp. 857-861, Sep. 2018. [Open Access]
- 異常音検知
  - K. Wilkinghoff, T. Fujimura, K. Imoto, J. Le Roux, Z.-H. Tan, T. Toda. Handling domain shifts for anomalous sound detection: a review of DCASE-related work. Proc. DCASE Workshop, pp. 20-24, Barcelona, Spain, Oct. 2025. [Open Access]
  - T. Fujimura, K. Wilkinghoff, K. Imoto, T. Toda. ASDKit: a toolkit for comprehensive evaluation of anomalous sound detection methods. Proc. DCASE Workshop, pp. 40-44, Barcelona, Spain, Oct. 2025. [Open Access]
  - I. Kuroyanagi, T. Fujimura, K. Takeda, T. Toda. Improving anomalous sound detection through pseudo-anomalous set selection and pseudo-label utilization under unlabeled conditions. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e13, pp. 1-28, June 2025. [Open Access]
  - I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Serial-OE: Anomalous sound detection based on serial method with outlier exposure capable of using small amounts of anomalous data for training. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e1, pp. 1-32, Jan. 2025. [Open Access]
  - T. Hayashi, T. Komatsu, R. Kondo, T. Toda, K. Takeda. Anomalous sound event detection based on WaveNet. Proc. EUSIPCO, pp. 2508-2512, Sep. 2018. [Open Access]

学位論文

卒業論文（名古屋大学板倉研究室）
- 雑音環境下における音声分析合成系STRAIGHTの品質改善
修士論文（奈良先端科学技術大学院大学鹿野研究室）
- STRAIGHT分析合成方式を用いた高品質な声質変換
博士論文（奈良先端科学技術大学院大学鹿野研究室）

- High-Quality and Flexible Speech Synthesis with Segment Selection and Voice Conversion

[戸田智基]

Page updated

Google Sites

Report abuse