研究
代表的な文献
音声分析
A. Miyashita, T. Toda. Differentiable representation of warping based on Lie group theory. Proc. IEEE WASPAA, 5 pages, Oct. 2023. [Paper]
A. Miyashita, T. Toda. Representation of vocal tract length transformation based on group theory. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
T. Toda, K. Tokuda. Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM. Proc. IEEE ICASSP, pp. 3925-3928, Apr. 2008. [Paper]
音声合成
R. Yoneyama, Y.-C. Wu, T. Toda. Source-Filter HiFiGAN: fast and pitch controllable high-fidelity neural vocoder. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
R. Yoneyama, Y.-C. Wu, T. Toda. High-fidelity and pitch-controllable neural vocoder based on unified source-filter networks. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 3717-3729, Oct. 2023. [Paper]
Y.-C. Wu, T. Hayashi, T. Okamoto, H. Kawai, T. Toda. Quasi-periodic parallel WaveGAN: a non-autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 792-806, Feb. 2021. [Paper]
戸田 智基. 機械学習と音声生成:統計的手法に基づく音声信号モデリング. 計測自動制御学会 (編) 機械学習の可能性 , コロナ社, Dec. 2022. [Paper]
音声変換
C. Xie, T. Toda. Noisy-to-noisy voice conversion under variations of noisy condition. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 3871-3882, Oct. 2023. [Paper]
W.-C. Huang, S.-W. Yang, T. Hayashi, T. Toda. A comparative study of self-supervised speech representation based voice conversion. IEEE Journal of Selected Topics in signal Processing, Vol. 16, No. 6, pp. 1308-1318, Oct. 2022. [Preprint]
W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, T. Toda. Pretraining techniques for sequence-to-sequence voice conversion. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 745-755, Feb. 2021. [Paper]
T. Toda, L.-H. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, J. Yamagishi. The Voice Conversion Challenge 2016. Proc. INTERSPEECH, pp. 1632-1636, Sep. 2016. [Paper]
戸田 智基. はじめての音声変換. 日本音響学会誌, Vol. 72, No. 6, pp. 324-331, June 2016. [Paper]
戸田 智基. 確率モデルに基づく声質変換技術. 日本音響学会誌, Vol. 67, No. 1, pp. 34-39, Jan. 2011. [Paper]
T. Toda, A.W. Black, K. Tokuda. Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 8, pp. 2222-2235, Nov. 2007. [Paper]
テキスト音声合成
T. Hayashi, R. Yamamoto, K. Inoue, T. Yoshimura, S. Watanabe, T. Toda, K. Takeda, Y. Zhang, X. Tan. ESPNET-TTS: Uunified, reproducible, and integratable open source end-to-end text-to-speech toolkit. Proc. IEEE ICASSP, pp. 7654-7658, May 2020. [Paper]
S. Takamichi, T. Toda, A.W. Black, G. Neubig, S. Sakti, S. Nakamura. Post-filters to modify the modulation spectrum for statistical parametric speech synthesis. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No. 4, pp. 755-767, Apr. 2016. [Paper]
K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, K. Oura. Speech synthesis based on hidden Markov models. Proceedings of the IEEE, Vol. 101, No. 5, pp. 1234-1252, May 2013. [Link]
T. Toda, K. Tokuda. A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions, Vol. E90-D, No. 5, pp. 816-824, May 2007. [Link]
河井 恒, 戸田 智基, 山岸 順一, 平井 俊男, 倪 晋富, 西澤 信行, 津崎 実, 徳田 恵一. 大規模コーパスを用いた音声合成システムXIMERA. 電子情報通信学会論文誌,Vol. J89-D-II, No. 12, pp. 2688-2698, Dec. 2006. [Link]
音声認識
J. He, Z. Yang, T. Toda. ED-CEC: improving rare word recognition using ASR post-processing based on error detection and context-aware error correction. Proc. IEEE ASRU, 6 pages, Dec. 2023. [Paper]
T. Hayashi, S. Watanabe, Y. Zhang, T. Toda, T. Hori, R. Astudillo, K. Takeda. Back-translation-style data augmentation for end-to-end ASR. Proc. IEEE SLT, pp. 426-433, Dec. 2018. [Paper]
音声表情認識
X. Shi, X. Li, T. Toda. Emotion awareness in multi-utterance turn for improving emotion prediction in multi-speaker conversation. Proc. INTERSPEECH, pp. 765-769, Aug. 2023. [Paper]
A. Ando, T. Mori, S. Kobashikawa, T. Toda. Speech emotion recognition based on listener-dependent emotion perception models. APSIPA Transactions on Signal and Information Processing, Vol. 10, e6, pp. 1-11, Apr. 2021. [Paper]
音声言語処理
Y. Yasuda, T. Toda. Investigation of Japanese Png BERT language model in text-to-speech synthesis for pitch accent language. IEEE Journal of Selected Topics in signal Processing, Vol. 16, No. 6, pp. 1319-1328, Oct. 2022. [Paper]
T. Hayashi, S. Watanabe, T. Toda, K. Takeda, S. Toshniwal, K. Livescu. Pre-trained text embeddings for enhanced text-to-speech synthesis. Proc. INTERSPEECH, pp. 4430-4434, Sep. 2019. [Paper]
音声対話
T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Learning cooperative persuasive dialogue policies using framing. Speech Communication, Vol. 84, pp. 83-96, Nov. 2016. [Link]
山内 祐輝, Graham Neubig, Sakriani Sakti, 戸田 智基, 中村 哲. 対話システムにおける用語間の関係性を用いた話題誘導応答文生成. 人工知能学会論文誌, Vol. 29, No. 1, pp. 80-89, Jan. 2014. [Paper]
音声翻訳
Q. Truong Do, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Preserving word-level emphasis in speech-to-speech translation. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 25, No. 3, pp. 544-556, Mar. 2017. [Link]
Y. Oda, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Optimizing segmentation strategies for simultaneous speech translation. Proc. ACL, pp. 551-556, June 2014. [Paper]
発声障碍者補助
L.P. Violeta, D. Ma, W.-C. Huang, T. Toda. Intermediate fine-tuning using imperfect synthetic speech for improving electrolaryngeal speech recognition. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
D. Ma, L.P. Violeta, K. Kobayashi, T. Toda. Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion. Proc. IEEE SLT, pp. 949-954, Jan. 2023. [Paper]
K. Kobayashi, T. Toda. Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN. Proc. EUSIPCO, pp. 396-400, Aug. 2020. [Paper]
K. Morikawa, T. Toda. Electrolaryngeal speech modification towards singing aid system for laryngectomees. Proc. APSIPA, 4 pages, Dec. 2017. [Paper]
H. Doi, T. Toda, K. Nakamura, H. Saruwatari, K. Shikano. Alaryngeal speech enhancement based on one-to-many eigenvoice conversion. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 22, No. 1, pp. 172-183, Jan. 2014. [Paper]
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Communication, Vol. 54, No. 1, pp. 134-146, Jan. 2012. [Link]
体内伝導音声処理
Y. Tajiri, T. Toda. Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring. Proc. 9th ISCA Speech Synthesis Workshop (SSW9), pp. 54-60, Sep. 2016. [Paper]
T. Toda, M. Nakagiri, K. Shikano. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, Vol. 20, No. 9, pp. 2505-2517, Sep. 2012. [Paper]
T. Toda, K. Nakamura, T. Nagai, T. Kaino, Y. Nakajima, K. Shikano. Technologies for processing body-conducted speech detected with non-audible murmur microphone. Proc. INTERSPEECH, pp. 632-635, Sep. 2009. [Paper]
調音・音響間マッピング
P.L. Tobing, K. Kobayashi, T. Toda. Articulatory controllable speech modification based on statistical inversion and production mappings. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 12, pp. 2337-2350, Dec. 2017. [Paper]
T. Toda, A.W. Black, K. Tokuda. Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Communication, Vol. 50, No. 3, pp. 215-227, Mar. 2008. [Paper]
音声品質推定
Y. Yasuda, T. Toda. Analysis of mean opinion scores in subjective evaluation of synthetic speech based on tail probabilities. Proc. INTERSPEECH, pp. 5491-5495, Aug. 2023. [Paper]
W.-C. Huang, E. Cooper, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. The VoiceMOS Challenge 2022. Proc. INTERSPEECH, pp. 4536-4540, Sep. 2022. [Paper]
W.-C. Huang, E. Cooper, J. Yamagishi, T. Toda. LDNet: unified listener dependent modeling in MOS prediction for synthetic speech. Proc. IEEE ICASSP, pp. 896-900, May 2022. [Paper]
E. Cooper, W.-C. Huang, T. Toda, J. Yamagishi. Generalization ability of MOS prediction networks. Proc. IEEE ICASSP, pp. 8442-8446, May 2022. [Paper]
詐称音声検知
X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V. Vestman, T. Kinnunen, K.A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. Wang, S. Le Maguer, M. Becker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Onuma, K. Mushika, T. Kaneda, Y. Jiang, L.-J. Liu, Y.-C. Wu, W.-C. Huang, T. Toda, K. Tanaka, H. Kameoka, I. Steiner, D. Matrouf, J.-F. Bonastre, A. Govender, S. Ronanki, J.-X. Zhang, Z.-H. Ling. ASVspoof 2019: a large-scale public database of synthetic, converted and replayed speech. Computer Speech and Language, Vol. 64, Article 101114, 25 pages, Nov. 2020. [Link]
T. Kinnunen, J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, Z. Ling. A spoofing benchmark for the 2018 voice conversion challenge: leveraging from spoofing countermeasures for speech artifact assessment. Proc. Odyssey 2018, pp. 187-194, June 2018. [Paper]
Z. Wu, P. De Leon, C. Demiroglu, A. Khodabakhsh, S. King, Z.-H. Ling, D. Saito, B. Stewart, T. Toda, M. Wester, J. Yamagishi. Anti-spoofing for text-independent speaker verification: an initial database, comparison of countermeasures, and human performance. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No. 4, pp. 768-783, Apr. 2016. [Link]
歌声変換・合成
W.-C. Huang, L.P. Violeta, S. Liu, J. Shi, T. Toda. The Singing Voice Conversion Challenge 2023. Proc. IEEE ASRU, 8 pages, Dec. 2023. [Preprint]
R. Yamamoto, R. Yoneyama, T. Toda. NNSVS: a neural network based singing voice synthesis toolkit. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
K. Kobayashi, T. Toda, S. Nakamura. Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential. Speech Communication, Vol. 99, pp. 211-220, May 2018. [Paper]
K. Kobayashi, T. Toda, H. Doi, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. Voice timbre control based on perceived age in singing voice conversion. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1419-1428, June 2014. [Paper]
H. Doi, T. Toda, T. Nakano, M. Goto, S. Nakamura. Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system. Proc. APSIPA ASC, Nov. 2012. [Paper]
自動採譜
S. Kim, K. Takeda, T. Toda. Sequence-to-sequence network training methods for automatic guitar transcription with tokenized outputs. Proc. ISMIR, pp. 524-531, Nov. 2023. [Paper]
S. Kim, T. Hayashi, T. Toda. Note-level automatic guitar transcription using attention mechanism. Proc. EUSIPCO, pp. 229-233, Aug.-Sep. 2022. [Paper]
楽曲分析
Y. Hashizume, L. Li, T. Toda. Music similarity calculation of individual instrumental sounds using metric learning. Proc. APSIPA ASC, pp. 33-38, Chiang Mai, Thailand, Nov. 2022. [Paper]
楽曲音源分離
S. Seki, T. Toda, K. Takeda. Stereophonic music separation based on non-negative tensor factorization with cepstral distance regularization. IEICE Transactions on Fundamentals, Vol. E101-A, No. 7, pp. 1057-1064, July 2018. [Link]
音源分離・目的音強調
R. Wang, T. Toda. Directional target speaker extraction under noisy underdetermined conditions through conditional variational autoencoder with global style tokens. Proc. IEEE WASPAA, 5 pages, Oct. 2023. [Paper]
S. Seki, H. Kameoka, L. Li, T. Toda, K. Takeda. Underdetermined source separation based on generalized multichannel variational autoencoder. IEEE Access, Vol. 7, No. 1, pp. 168104-168115, Nov. 2019. [Paper]
多チャンネル信号処理
S. Luan, Y. Wakabayashi, T. Toda. Sound field interpolation with unsupervised calibration for freely spaced circular microphone array in rotation-robust beamforming Proc. EUSIPCO, pp. 21-25, Sep. 2023. [Paper]
S. Luan, Y. Wakabayashi, T. Toda. Modified sound field interpolation method for rotation-robust beamforming with unequally spaced circular microphone array. Proc. EUSIPCO, pp. 344-348, Aug.-Sep. 2022. [Paper]
H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Enhancing event-related potentials based on maximum a posteriori estimation with a spatial correlation prior. IEICE Transactions on Information and Systems, Vol. E99-D, No. 6, pp. 1410-1419, June 2016. [Paper]
音響イベント認識
K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Weakly-supervised sound event detection with self-attention. Proc. IEEE ICASSP, pp. 66-70, May 2020. [Paper]
T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda. Duration-controlled LSTM for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 11, pp. 2059-2070, Nov. 2017. [Paper]
音環境記述
T. Komatsu, Y. Fujita, K. Takeda, T. Toda. Audio difference learning for audio captioning. IEEE ICASSP, Apr. 2024. [Preprint]
K. Miyazaki, T. Hayashi, T. Toda, K. Takeda. Connectionist temporal classification-based sound event encoder for converting sound events into onomatopoeia representations. Proc. EUSIPCO, pp. 857-861, Sep. 2018. [Paper]
異常音検知
I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Improvement of serial approach to anomalous sound detection by incorporating two binary cross-entropies for outlier exposure. Proc. EUSIPCO, pp. 294-298, Aug.-Sep. 2022. [Paper]
I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Anomalous sound detection using a binary classification model and class centroids. Proc. EUSIPCO, pp. 1995-1999, Aug. 2021. [Paper]
T. Hayashi, T. Komatsu, R. Kondo, T. Toda, K. Takeda. Anomalous sound event detection based on WaveNet. Proc. EUSIPCO, pp. 2508-2512, Sep. 2018. [Paper]
学位論文
卒業論文(名古屋大学板倉研究室)
雑音環境下における音声分析合成系STRAIGHTの品質改善
修士論文(奈良先端科学技術大学院大学鹿野研究室)
STRAIGHT分析合成方式を用いた高品質な声質変換
博士論文(奈良先端科学技術大学院大学鹿野研究室)