Tomoki Toda

発表リスト（学術論文／レター）

学術論文／レター, 国際会議, 著書・解説, 研究会, 大会講演, 招待／訪問講演, その他

学術論文

1. C.-H. Hu, Y. Yasuda, T. Toda. Investigation of preference-based speech quality assessment by integrating adaptive pair selection and pseudo-labeling," APSIPA Transactions on Signal and Information Processing. APSIPA Transactions on Signal and Information Processing, Vol. **, No. *, pp. ***-***, ***. 2026. (Accepted)
2. J. Feng, Y. Yasuda, T. Toda. CTC score-based transcription quality annotation for stable text-to-speech synthesis training on noisy transcriptions. APSIPA Transactions on Signal and Information Processing, Vol. **, No. *, pp. ***-***, ***. 2026. (Accepted)
3. D. Ma, J. Mi, F. Li, L.P. Violeta, J. He, W.-C. Huang, K. Kobayashi, T. Toda. Advancing electrolaryngeal speech enhancement through speech-text representation learning. IEEE Transactions on Biomedical Engineering, Vol. ** , pp. ***-***, *** 2026. (Accepted)
4. R. Yoneyama, T. Toda. SiFi-GAN: combining source-filter modeling and upsampling-based high-fidelity neural vocoder for fast and pitch-controllable speech synthesis. IEICE Transactions on Information and Systems, Vol. E109-D, No. 6, pp. 945-956, June 2026.
5. X. Shi, X. Li, T. Toda. Emotion similarity and shift: modeling temporal dynamic interactions for emotion prediction in conversation. IEEE Transactions on Audio, Speech and Language Processing, Vol. 34 , pp. 2552-2567, Apr. 2026.
6. W.-C. Huang, E. Cooper, T. Toda. MOS-Bench: benchmarking generalization abilities of subjective speech quality assessment models. IEEE Transactions on Audio, Speech and Language Processing, Vol. 34, pp. 2385-2397, Apr. 2026.
7. J. Feng Y. Yasuda, T. Toda. An investigation of the robustness of flow- and diffusion-based speech generation models on noisy transcriptions. APSIPA Transactions on Signal and Information Processing, Vol. 15, No. 1, pp. 270-292, Apr. 2026.
8. Y. Hashizume, T. Toda. Investigation of perceptual music similarity based on individual instrumental parts by large-scale listening test. APSIPA Transactions on Signal and Information Processing, Vol. 15, No. 1, pp. 249-269, Apr. 2026.
9. T. Komatsu, H. Munakata, Y. Ishikawa, K. Takeda, T. Toda. Semi-supervised text-audio contrastive learning method using pseudo-text input. APSIPA Transactions on Signal and Information Processing, Vol. 15, No. 1, pp. 183-198, Apr. 2026.
10. J. Mi, X. Shi, D. Ma, J. He, T. Fujimura, T. Toda. Robust speech emotion recognition under human speech noise. Computer Speech and Language, Vol. 100, Article 101987, pp. 1-16, Apr. 2026.
11. X. Shi, J. He, X. Li, T. Toda. A comprehensive study on the effectiveness of ASR representations for noise-robust speech emotion recognition. IEEE Transactions on Audio, Speech and Language Processing, Vol. 34 , pp. 707-722, Jan. 2026.
12. B.M. Halpern, W.-C. Huang, L.P. Violeta, T. Toda. Severity-controllable pathological text-to-speech synthesis for clinical applications," IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 34, pp. 573-582, Jan. 2026.
13. H. Yamashita, T. Okamoto, R. Takashima, Y. Ohtani, T. Takiguchi, T. Toda, H. Kawai. Sequence-to-sequence voice conversion with weighted guided attention. IEEE Access, Vol. 13, pp. 216583-216595, Dec. 2025.
14. B.M. Halpern, T.B. Tienkamp, T. Rebernik, R.J.J.H. van Son, S.A.H.J. de Visscher, M.J.H. Witjes, D. Abur, T. Toda. XPPG-PCA: reference-free automatic speech severity evaluation with principal components. IEEE Journal of Selected Topics in signal Processing, Vol. 19, No. 5, pp. 783-795, Oct. 2025.
15. L.P. Violeta, W.-C. Huang, D. Ma, R. Yamamoto, K. Kobayashi, T. Toda. Resolving domain mismatches in electrolaryngeal speech enhancement with linguistic intermediates. IEEE Journal of Selected Topics in signal Processing, Vol. 19, No. 5, pp. 827-839, June 2025.
16. T. Komatsu, K. Takeda, T. Toda. Audio difference learning framework for audio captioning. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e34, pp. 1-18, Nov. 2025.
17. R. Yoneyama, A. Miyashita, R. Yamamoto, T. Toda. Wavehax: aliasing-free neural waveform synthesis based on 2D convolution and harmonic prior for reliable complex spectrogram estimation. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33 , pp. 4454-4470, Oct. 2025.
18. T. Imamura, Y. Hashizume, W.-C. Huang, T. Toda. Music similarity representation learning focusing on individual instruments with source separation and human preference. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 4, e305, pp. 1-29, Oct. 2025.
19. 西尾直樹, 小林和弘, 戸田智基. 喉頭摘出者における自己音声の再獲得～Save the Voice Project～. 気管食道科学会会報, Vol. 76, No. 5, pp. 255-263, Oct. 2025.
20. J. He, X. Shi, C.-H. Hu, J. Mi, X. Li, T. Toda. M4SER: multimodal, multirepresentation, multitask, and multistrategy learning for speech emotion recognition. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 4055-4070, Sep. 2025.
21. D. Yoshioka, Y. Nakata, Y. Yasuda, T. Toda. Text- and speech-style control for lecture speech generation focusing on disfluency. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e26, pp. 1-31, Sep. 2025.
22. Y. Yasuda, T. Toda. Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment. Computer Speech and Language, Vol. 96, Article 101888, pp. 1-16, Sep. 2025.
23. S. Chen, T. Toda. QHARMA-GAN: quasi-harmonic neural vocoder based on autoregressive moving average model. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 3703-3719, Sep. 2025.
24. D. Ma, L.P. Violeta, K. Kobayashi, T. Toda. Pretraining and fine-tuning techniques for electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 3189-3201, July 2025.
25. Y. Hashizume, L. Li, A. Miyashita, T. Toda. Learning separated representations for instrument-based music similarity. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e16, pp. 1-32, July 2025.
26. T. Ogura, T. Okamoto, Y. Ohtani, E. Cooper, T. Toda, H. Kawai. Phoneme-level duration controllable neural text-to-speech with phoneme embedding skip connection and modified Gaussian duration modeling. IEEE Access, Vol. 13, pp. 118369-118380, July 2025.
27. Y. Choi, C. Xie, T. Toda. Noise and reverberation-controllable voice conversion. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 2430-2443, June 2025.
28. J. He, T. Toda. PMF-CEC: phoneme-augmented multimodal fusion for context-aware ASR error correction with error-specific selective decoding. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 2402-2417, June 2025.
29. I. Kuroyanagi, T. Fujimura, K. Takeda, T. Toda. Improving anomalous sound detection through pseudo-anomalous set selection and pseudo-label utilization under unlabeled conditions. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e13, pp. 1-28, June 2025.
30. T. Fujimura, T. Toda. Analysis and extension of noisy-target training for unsupervised target signal enhancement. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e12, pp. 1-27, June 2025.
31. C. Xie, T. Toda. An investigation of noisy-to-noisy voice conversion performance in various noisy conditions. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e10, pp. 1-30, June 2025.
32. D. Ma, Y. Choi, T. Fujimura, F. Li, C. Xie, K. Kobayashi, T. Toda. Sequence-to-sequence voice conversion-based techniques for electrolaryngeal speech enhancement in noisy and reverberant conditions. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e8, pp. 1-40, May 2025.
33. Y. Ohtani, T. Okamoto, T. Toda, H. Kawai. Fast neural vocoder with fundamental frequency control using finite impulse response filters. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 1893-1906, Apr. 2025.
34. M. Eshghi, T. Toda. Predicting fundamental frequency patterns in electrolaryngeal speech using automated phoneme extraction. IEEE Access, Vol. 13, pp. 73831-73847, Apr. 2025.
35. S. Luan, Y. Wakabayashi, T. Toda. Generalized sound field interpolation for freely spaced microphone arrays in rotation-robust beamforming. Applied Acoustics, Vol. 236, Article 110706, pp. 1-15, Apr. 2025.
36. C.-H. Hu, Y. Yasuda, T. Toda. E2EPref: an end-to-end preference-based framework for speech quality assessment to alleviate bias in direct assessment scores. Computer Speech and Language, Vol. 93, Article 101799, pp. 1-17, Mar. 2025.
37. F. Li, F. Shen, D. Ma, J. Zhou, L. Wang, F. Fan, T. Liu, X. Chen, T. Toda, H. Niu. Mandarin speech reconstruction from surface electromyography based on generative adversarial networks. Medicine in Novel Technology and Devices, Vol. 26, Article 100359, pp. 1-7, Mar. 2025.
38. S. Chen, T. Toda. Sequence-wise speech waveform modeling via gradient descent optimization of quasi-harmonic parameters. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 319-332, Jan. 2025.
39. D. Yoshioka, Y. Yasuda, T. Toda. Nonparallel spoken-text-style transfer for linguistic expression control in speech generation. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 333-346, Jan. 2025.
40. R. Wang, T. Fujimura, T. Toda. Target speaker extraction under noisy underdetermined conditions using conditional variational autoencoder, global style token, and neural postfilter. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e2, pp. 1-26, Jan. 2025.
41. I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Serial-OE: Anomalous sound detection based on serial method with outlier exposure capable of using small amounts of anomalous data for training. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e1, pp. 1-32, Jan. 2025.
42. F. Li, F. Shen, D. Ma, J. Zhou, S. Zhang, L. Wang, F. Fan, T. Liu, X. Chen, T. Toda, H. Niu. End-to-end Mandarin speech reconstruction based on ultrasound tongue images using deep learning . IEEE Transactions on Neural Systems and Rehabilitation Engineering , Vol. 33, pp. 140-149, Dec. 2024.
43. S. Luan, Y. Wakabayashi, T. Toda. Unequally spaced sound field interpolation for rotation-robust beamforming. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 3185-3199, June 2024.
44. L.P. Violeta, D. Ma, W.-C. Huang, T. Toda. Pretraining and adaptation techniques for electrolaryngeal speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 2777-2789, May 2024.
45. M. Eshghi, T. Toda. An investigation of fundamental frequency pattern prediction for Japanese eelectrolaryngeal speech enhancement based on frame-wise phoneme representations. IEEE Access, Vol. 12, pp. 50137-50153, Apr. 2024.
46. R. Wang, L. Li, T. Toda. Dual-channel target speaker extraction based on conditional variational autoencoder and directional information. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 1968-1979, Mar. 2024.
47. H. Yamashita, T. Okamoto, R. Takashima, Y. Ohtani, T. Takiguchi, T. Toda, H. Kawai. Fast neural speech waveform generative models with fully-connected layer-based upsampling. IEEE Access, Vol. 12, pp. 31409-31421, Feb. 2024.
48. C. Xie, T. Toda. Noisy-to-noisy voice conversion under variations of noisy condition. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 3871-3882, Oct. 2023.
49. R. Yoneyama, Y.-C. Wu, T. Toda. High-fidelity and pitch-controllable neural vocoder based on unified source-filter networks. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 3717-3729, Oct. 2023.
50. K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, H. Kawai. Harmonic-Net: fundamental frequency and speech rate controllable fast neural vocoder. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 1902-1915, May 2023.
51. W.-C. Huang, S.-W. Yang, T. Hayashi, T. Toda. A comparative study of self-supervised speech representation based voice conversion. IEEE Journal of Selected Topics in signal Processing, Vol. 16, No. 6, pp. 1308-1318, Oct. 2022.
52. Y. Yasuda, T. Toda. Investigation of Japanese Png BERT language model in text-to-speech synthesis for pitch accent language. IEEE Journal of Selected Topics in signal Processing, Vol. 16, No. 6, pp. 1319-1328, Oct. 2022.
53. Y.-C. Wu, P.L. Tobing, K. Yasuhara, N. Matsunaga, Y. Ohtani, T. Toda. A cyclical approach to synthetic and natural speech mismatch refinement of neural post-filter for low-cost text-to-speech system. APSIPA Transactions on Signal and Information Processing, Vol. 11, No. 1, e30, pp. 1-32, Sep. 2022 .
54. T. Okamoto, K. Matsubara, T. Toda, Y. Shiga, H. Kawai. Neural speech-rate conversion with multispeaker WaveNet vocoder. Speech Communication, Vol. 138, pp. 1-12, Mar. 2022.
55. K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, Y. Shiga, H. Kawai. Full-band LPCNet: a real-time neural vocoder for 48 kHz audio with a CPU. IEEE Access, Vol. 9, pp. 94923-94933, July 2021.
56. A. Ando, T. Mori, S. Kobashikawa, T. Toda. Speech emotion recognition based on listener-dependent emotion perception models. APSIPA Transactions on Signal and Information Processing, Vol. 10, e6, pp. 1-11, Apr. 2021.
57. Y.-C. Wu, T. Hayashi, P.L. Tobing, K. Kobayashi, T. Toda. Quasi-periodic WaveNet: an autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 1134-1148, Mar. 2021.
58. Y.-C. Wu, T. Hayashi, T. Okamoto, H. Kawai, T. Toda. Quasi-periodic parallel WaveGAN: a non-autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 792-806, Feb. 2021.
59. W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, T. Toda. Pretraining techniques for sequence-to-sequence voice conversion. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 745-755, Feb. 2021.【IEEE Signal Processing Society Japan Student Best Paper Award (受賞者：Wen-Chin Huang)】
60. H. Kameoka, W.-C. Huang, K. Tanaka, T. Kaneko, N. Hojo, T. Toda. Many-to-many voice transformer network. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 656-670, Jan. 2021.
61. P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda. An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder. APSIPA Transactions on Signal and Information Processing, Vol. 9, e26, pp. 1-14, Nov. 2020.
62. X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V. Vestman, T. Kinnunen, K.A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. Wang, S. Le Maguer, M. Becker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Onuma, K. Mushika, T. Kaneda, Y. Jiang, L.-J. Liu, Y.-C. Wu, W.-C. Huang, T. Toda, K. Tanaka, H. Kameoka, I. Steiner, D. Matrouf, J.-F. Bonastre, A. Govender, S. Ronanki, J.-X. Zhang, Z.-H. Ling. ASVspoof 2019: a large-scale public database of synthetic, converted and replayed speech. Computer Speech and Language, Vol. 64, Article 101114, 25 pages, Nov. 2020.
63. Y.-C. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda. Non-parallel voice conversion system with WaveNet vocoder and collapsed speech suppression. IEEE Access, Vol. 8, No. 1, pp. 62094-62106, Apr. 2020.
64. 大平茂輝, 清谷峻也, 伊藤瑠哉, 岡本康佑, 谷川右京, 出口大輔, 戸田智基. LMS経由で手書きレポートを返却するWebサービス「かみレポ」の開発・評価. 情報処理学会論文誌：教育とコンピュータ, Vol. 6, No. 1, pp. 52-68, Feb. 2020.
65. A. Ando, R. Masumura, H. Kamiyama, S. Kobashikawa, Y. Aono, T. Toda. Customer satisfaction estimation in contact center calls based on a hierarchical multi-task model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 28, No. 1, pp. 715-728, Jan. 2020.
66. P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda. Voice conversion with CycleRNN-based spectral mapping and finly tuned WaveNet vocoder. IEEE Access, Vol. 7, No. 1, pp. 171114-171125, Dec. 2019.
67. S. Seki, H. Kameoka, L. Li, T. Toda, K. Takeda. Underdetermined source separation based on generalized multichannel variational autoencoder. IEEE Access, Vol. 7, No. 1, pp. 168104-168115, Nov. 2019.
68. A. Tamamori, T. Hayashi, T. Toda, K. Takeda. Daily activity recognition based on recurrent neural network using multi-modal signals. APSIPA Transactions on Signal and Information Processing, Vol. 7, e21, pp. 1-11, Dec. 2018.
69. T. Kano, S. Takamichi, S. Sakti, G. Neubig, T. Toda, S. Nakamura. An end-to-end model for cross-lingual transformation of paralinguistic information. Machine Translation, Vol. 32, No. 4, pp. 353-368, Dec. 2018.
70. S. Seki, T. Toda, K. Takeda. Stereophonic music separation based on non-negative tensor factorization with cepstral distance regularization. IEICE Transactions on Fundamentals, Vol. E101-A, No. 7, pp. 1057-1064, July 2018.
71. K. Kobayashi, T. Toda, S. Nakamura. Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential. Speech Communication, Vol. 99, pp. 211-220, May 2018.
72. T. Hayashi, M. Nishida, N. Kitaoka, T. Toda, K. Takeda. Daily activity recognition with large-scaled real-life recording datasets based on deep neural network using multi-modal signals. IEICE Transactions on Fundamentals, Vol. E101-A, No. 1, pp. 199-210, Jan. 2018.
73. P.L. Tobing, K. Kobayashi, T. Toda. Articulatory controllable speech modification based on statistical inversion and production mappings. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 12, pp. 2337-2350, Dec. 2017.
74. T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda. Duration-controlled LSTM for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 11, pp. 2059-2070, Nov. 2017.【IEEE Signal Processing Society Japan Young Author Best Paper Award (受賞者：Tomoki Hayashi)】
75. K. Tanaka, T. Toda, S. Nakamura. A vibration control method of an electrolarynx based on statistical F0 pattern prediction. IEICE Transactions on Information and Systems, Vol. E100-D, No. 9, pp. 2165-2173, Sep. 2017.
76. Q. Truong Do, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Preserving word-level emphasis in speech-to-speech translation. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 25, No. 3, pp. 544-556, Mar. 2017.【IEEE Signal Processing Society Japan Student Best Paper Award (受賞者：Quoc Truong Do)】
77. 三浦明波, Graham Neubig, Sakriani Sakti, 戸田智基, 中村哲. 中間言語情報を記憶するピボット翻訳手法. 自然言語処理, Vol. 23, No. 5, pp. 499-528, Dec. 2016.
78. Y. Oshima, S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Non-native text-to-speech preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. IEICE Transactions on Information and Systems, Vol. E99-D, No. 12, pp. 3132-3139, Dec. 2016.
79. K. Kobayashi, T. Toda, T. Nakano, M. Goto, S. Nakamura. Improvements of voice timbre control based on perceived age in singing voice conversion. IEICE Transactions on Information and Systems, Vol. E99-D, No. 11, pp. 2767-2777, Nov. 2016.
80. T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Learning cooperative persuasive dialogue policies using framing. Speech Communication, Vol. 84, pp. 83-96, Nov. 2016.
81. S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A statistical sample-based approach to GMM-based voice conversion using tied-covariance acoustic models. IEICE Transactions on Information and Systems, Vol. E99-D, No. 10, pp. 2490-2498, Oct. 2016.
82. H. Tanaka, S. Sakti, G. Neubig, T. Toda, H. Negoro, H. Iwasaka, S. Nakamura. Teaching social communication skills through human-agent interaction. ACM Transactions on Interactive Intelligent Systems, Vol. 6, No. 2, 23 pages, Aug. 2016.
83. H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Enhancing event-related potentials based on maximum a posteriori estimation with a spatial correlation prior. IEICE Transactions on Information and Systems, Vol. E99-D, No. 6, pp. 1410-1419, June 2016.
84. S. Takamichi, T. Toda, A.W. Black, G. Neubig, S. Sakti, S. Nakamura. Post-filters to modify the modulation spectrum for statistical parametric speech synthesis. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No. 4, pp. 755-767, Apr. 2016.【IEEE Signal Processing Society Japan Young Author Best Paper Award (受賞者：Shinnosuke Takamichi)】
85. Z. Wu, P. De Leon, C. Demiroglu, A. Khodabakhsh, S. King, Z.-H. Ling, D. Saito, B. Stewart, T. Toda, M. Wester, J. Yamagishi. Anti-spoofing for text-independent speaker verification: an initial database, comparison of countermeasures, and human performance. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No. 4, pp. 768-783, Apr. 2016.
86. 赤部晃一, Graham Neubig, Sakriani Sakti, 戸田智基, 中村哲. 機械翻訳システムの誤り分析のための誤り箇所選択手法. 自然言語処理, Vol. 23, No. 1, pp. 88-117, Jan. 2016.
87. 水上雅博, Lasguido Nio, 木付英士, 野村敏男, Graham Neubig, 吉野幸一郎, Sakriani Sakti, 戸田智基, 中村哲. 快適度推定に基づく用例ベース対話システム. 人工知能学会論文誌, Vol. 31, No. 1, 12 pages, Jan. 2016.
88. P. Arthur, G. Neubig, S. Sakti, T. Toda, and S. Nakamura. Semantic parsing of ambiguous input through paraphrasing and verification. Transactions of the Association for Computational Linguistics, Vol. 3, pp. 571-584, Dec. 2015.
89. H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. NOCOA+: multimodal computer-based training for social and communication skills. IEICE Transactions on Information and Systems, Vol. E98-D, No. 8, pp. 1536-1544, Aug. 2015.
90. K. Kobayashi, T. Toda, H. Doi, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. Voice timbre control based on perceived age in singing voice conversion. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1419-1428, June 2014.
91. K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A hybrid approach to electrolaryngeal speech enhancement based on noise reduction and statistical excitation generation. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1429-1437, June 2014.
92. K. Kubo, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Structured adaptive regularization of weight vectors for a robust grapheme-to-phoneme conversion model. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1468-1476, June 2014.
93. L. Nio, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Utilizing human-to-human conversation examples for a multi domain chat-oriented dialog system. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1497-1505, June 2014.
94. S. Takamichi, T. Toda, Y. Shiga, S. Sakti, G. Neubig, S. Nakamura. Parameter generation methods with rich context models for high-quality and flexible text-to-speech synthesis. IEEE Journal of Selected Topics in Signal Processing, Vol. 8, No. 2, pp. 239-250, Apr. 2014.【電気通信普及財団賞第30回テレコムシステム技術学生賞 (受賞者：Shinnosuke Takamichi)】【IEEE関西支部学生研究奨励賞 (受賞者：Shinnosuke Takamichi)】
95. H. Doi, T. Toda, K. Nakamura, H. Saruwatari, K. Shikano. Alaryngeal speech enhancement based on one-to-many eigenvoice conversion. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 22, No. 1, pp. 172-183, Jan. 2014.
96. 山内祐輝, Graham Neubig, Sakriani Sakti, 戸田智基, 中村哲. 対話システムにおける用語間の関係性を用いた話題誘導応答文生成. 人工知能学会論文誌, Vol. 29, No. 1, pp. 80-89, Jan. 2014.
97. T. Toda, M. Nakagiri, K. Shikano. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, Vol. 20, No. 9, pp. 2505-2517, Sep. 2012.
98. T. Nakamura, K. Sugiura, T. Nagai, N. Iwahashi, T. Toda, H. Okada, T. Omori. Learning novel objects for extended mobile manipulation. Journal of Intelligent and Robotic Systems, Vol. 66, No. 1-2, pp. 187-204, Apr. 2012.
99. 中村友昭, アッタミミムハンマド, 杉浦孔明, 長井隆行, 岩橋直人, 戸田智基, 岡田浩之, 大森隆司. 拡張モバイルマニピュレーションのための新規物体の学習. 日本ロボット学会誌, Vol. 30, No. 2, pp. 213-224, Mar. 2012.
100. T. Kubo, T. Toda, M. Yoshida, T. Hattori, K. Ikeda. Vowel recognition based on surface electromyography with electrode grid on submental region. Transactions of Japanese Society for Medical and Biological Engineering, Vol. 50, No. 1, pp. 38-46, Feb. 2012.
101. K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Communication, Vol. 54, No. 1, pp. 134-146, Jan. 2012.
102. H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Esophageal speech enhancement based on statistical voice conversion with Gaussian mixture models. IEICE Transactions on Information and Systems, Vol. E93-D, No. 9, pp. 2472-2482, Sep. 2010.
103. Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Improvements of the one-to-many eigenvoice conversion system. IEICE Transactions on Information and Systems, Vol. E93-D, No. 9, pp. 2491-2499, Sep. 2010.
104. K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Evaluation of extremely small sound source signals used in speaking-aid system with statistical voice conversion. IEICE Transactions on Information and Systems, Vol. E93-D, No. 7, pp. 1909-1917, July 2010.
105. Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Adaptive training for voice conversion based on eigenvoices. IEICE Transactions on Information and Systems, Vol. E93-D, No. 6, pp. 1589-1598, June 2010.
106. T. Hirahara, M. Otani, S. Shimizu, T. Toda, K. Nakamura, Y. Nakajima, K. Shikano. Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech Communication, Vol. 52, No. 4, pp. 301-313, Apr. 2010.
107. V.-A. Tran, G. Bailly, H. Loevenbruck, T. Toda. Improvement to a NAM-captured whisper-to-speech system. Speech Communication, Vol. 52, No. 4, pp. 314-326, Apr. 2010.
108. J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, S. Renals. Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Transactions on Audio, Speech and Language Processing, Vol. 17, No. 6, pp. 1208-1230, Aug. 2009.
109. R. Gomez, T. Toda, H. Saruwatari, K. Shikano. Techniques in rapid unsupervised speaker adaptation based on HMM-sufficient statistics. Speech Communication, Vol. 51, No. 1, pp. 42-57, Jan. 2009.
110. H. Zen, T. Toda, K. Tokuda. The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. IEICE Transactions on Information and Systems, Vol. E91-D, No. 6, pp. 1764-1773, June 2008.
111. 大谷大和, 戸田智基, 猿渡洋, 鹿野清宏. STRAIGHT混合励振源を用いた混合正規分布モデルに基づく最尤声質変換法. 電子情報通信学会論文誌，Vol. J91-D, No. 4, pp. 1082-1091, Apr. 2008．
112. T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Cost reduction of acoustic modeling for real-environment applications using unsupervised and selective training. IEICE Transactions on Information and Systems, Vol. E91-D, No. 3, pp. 499-507, Mar. 2008.
113. G. Nagino, M. Shozakai, T. Toda, H. Saruwatari, K. Shikano. Building an effective speech corpus by utilizing statistical multidimensional scaling method. IEICE Transactions on Information and Systems, Vol. E91-D, No. 3, pp. 607-614, Mar. 2008.
114. T. Toda, A.W. Black, K. Tokuda. Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Communication, Vol. 50, No. 3, pp. 215-227, Mar. 2008.【The 2013 EURASIP-ISCA Best Paper Award (Speech Communication Journal)】
115. T. Toda, A.W. Black, K. Tokuda. Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 8, pp. 2222-2235, Nov. 2007.【IEEE Signal Processing Society 2009 Young Author Best Paper Award】
116. T. Toda, K. Tokuda. A Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions on Information and Systems, Vol. E90-D, No. 5, pp. 816-824, May 2007.【電気通信普及財団賞第23回テレコムシステム技術賞】【電子情報通信学会平成19年度情報・システムソサイエティ論文賞（連作論文）】
117. 中村圭吾, 戸田智基, 猿渡洋, 鹿野清宏. 肉伝導人工音声の変換に基づく喉頭全摘出者のための音声コミュニケーション支援システム. 電子情報通信学会論文誌，Vol. J90-D, No. 3, pp. 780-787, Mar. 2007．
118. R. Gomez, T. Toda, H. Saruwatari, K. Shikano. Reducing computation time of the rapid unsupervised speaker adaptation based on HMM-sufficient statistics. IEICE Transactions on Information and Systems, Vol. E90-D, No. 2, pp. 554-561, Feb. 2007.
119. H. Zen, T. Toda, M. Nakamura, K. Tokuda. Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Transactions on Information and Systems, Vol. E90-D, No. 1, pp. 325-333, Jan. 2007.【電気通信普及財団賞第23回テレコムシステム技術賞）】【電子情報通信学会平成19年度情報・システムソサイエティ論文賞（連作論文）】
120. 河井恒, 戸田智基, 山岸順一, 平井俊男, 倪晋富, 西澤信行, 津崎実, 徳田恵一. 大規模コーパスを用いた音声合成システムXIMERA. 電子情報通信学会論文誌，Vol. J89-D-II, No. 12, pp. 2688-2698, Dec. 2006.
121. 平井俊男, 河井恒, 津崎実, 戸田智基. 音声合成システムXIMERAにおける日本語合成音の自然性劣化要因の分析. 日本音響学会誌, Vol. 62, No. 11, pp. 767-773, Nov. 2006．
122. T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Utterance-based selective training for the automatic creation of task-dependent acoustic models. IEICE Transactions on Information and Systems, Vol. E89-D, No. 3, pp. 962-969, Mar. 2006.
123. R. Gomez, A. Lee, T. Toda, H. Saruwatari, K. Shikano. Improving rapid unsupervised speaker adaptation based on HMM-sufficient statistics in noisy environments using multi-template models. IEICE Transactions on Information and Systems, Vol. E89-D, No. 3, pp. 998-1005, Mar. 2006.
124. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis. Speech Communication, Vol. 48, No. 1, pp. 45-56, Jan. 2006.
125. K. Adachi, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano. Designing target cost function based on prosody of speech database. IEICE Transactions on Information and Systems, Vol. E88-D, No. 3, pp. 519-524, Mar. 2005.
126. 舛田剛志, 戸田智基, 川波弘道, 猿渡洋, 鹿野清宏. 韻律的に多重化した音声データベースの設計と発話速度におけるその評価. 電子情報通信学会論文誌，Vol. J87-D-II, No. 2, pp. 447-455, Feb. 2004.
127. 戸田智基, 河井恒, 津崎実, 鹿野清宏. 素片接続型日本語テキスト音声合成における音素単位とダイフォン単位に基づく素片選択. 電子情報通信学会論文誌，Vol. J85-D-II, No. 12, pp. 1760-1770, Dec. 2002.
128. M. Mashimo, T. Toda, H. Kawanami, K. Shikano, N. Campbell. Cross-language voice conversion evaluation using bilingual databases. IPSJ Journal, Vol. 43, No. 7, pp. 2177-2185, July 2002.
129. 戸田智基，陸金林，猿渡洋，鹿野清宏. 周波数軸伸縮を用いた混合正規分布モデルに基づく声質変換法. 電子情報通信学会論文誌，Vol. J84-D-II, No. 10, pp. 2181-2189, Oct. 2001.【電気通信普及財団賞第18回テレコムシステム技術学生賞】
130. 戸田智基, 坂野秀樹, 梶田将司, 武田一哉, 板倉文忠, 鹿野清宏. 側抑制性重み付けを用いた雑音環境下におけるSTRAIGHT分析合成系の品質改善. 電子情報通信学会論文誌，Vol. J83-D-II, No. 11, pp. 2180-2189, Nov. 2000.

レター

1. N. Nishio, K. Kobayashi, D. Ma, S. Mitani, M. Sone, T. Toda. A voice conversion system from electrolarynx speech to preoperative patient’s speech for total laryngectomy. OTO Open, Vol. 10, No. 1, Scientific Briefing, 5 pages, Feb. 2026.
2. W.-C. Huang, Y.-C. Wu, T. Toda. Multi-speaker text-to-speech training with speaker anonymized data. IEEE Signal Processing Letters, Vol. 31, pp. 2995-2999, Oct. 2024.
3. K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, H. Kawai. Comparison of real-time multi-speaker neural vocoders on CPUs . Acoustical Science and Technology, Acoustical Letter, Vol. 43, No. 2, pp. 121-124, Mar. 2022.
4. K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, Y. Shiga, H. Kawai. Investigation of training data size for real-time neural vocoders on CPUs. Acoustical Science and Technology, Acoustical Letter, Vol. 42, No. 1, pp. 65-68, Jan. 2021.
5. T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, H. Kawai. Deep neural network-based power spectrum reconstruction to improve quality of vocoded speech with limited acoustic parameters. Acoustical Science and Technology, Acoustical Letter, Vol. 39, No. 2, pp. 163-166, Mar. 2018.
6. H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. NOCOA: A Computer-Based Training Tool for Social and Communication Skills That Exploits Non-verbal Behaviors. The Journal of Information and Systems in Education (Short Note), Vol. 12, No. 1, pp. 19-26, Apr. 2014.

[戸田智基]

Page updated

Google Sites

Report abuse