CV

Tomoki Toda

Global Research Institute for Mobility in Society,
Institutes of Innovation for Future Society,
Nagoya University

Furo-cho, Chikusa-ku, Nagoya, 464-8601, JAPAN

E-mail: toda.tomoki.v6__at__f.mail.nagoya-u.ac.jp

TEL: +81-52-789-4346

Web: https://sites.google.com/site/tomokitoda/home

Research interests

Tomoki Toda is interested in sound media information processing. His research topics include speech processing, such as speech generation, speech analysis, speech recognition, speech assessment, speech translation, spoken language processing, music processing, such as music analysis, music source separation, music generation, singing voice analysis, and singing voice synthesis, and sound environment processing, such as sound event recognition, anomalous sound detection, audio captioning, and multichannel sound signal processing.

Keywords: signal processing, machine learning, deep learning

Education

Apr. 1995 - Mar. 1999

- - School of Engineering, Nagoya University, Japan
    - - B.E. degree in Electrical and Electronic Engineering and Information Engineering, 1999

Apr. 1999 - Mar. 2003

- - Graduate School of Information Science, Nara Institute of Science and Technology, Japan
    - - Master degree in engineering, 2001
      - Doctor degree in engineering, 2003

Professional Experience

Apr. 2003 - Mar. 2005

- - Japan Society for the Promotion of Science (JSPS)
    - - Research Fellow (Affiliation: Nagoya Institute of Technology)

Apr. 2005 - Mar. 2011

- - Graduate School of Information Science, Nara Institute of Science and Technology (NAIST), Japan
    - - Assistant Professor

Apr. 2011 - Aug. 2015

- - Graduate School of Information Science, NAIST, Japan
    - - Associate Professor

Sep. 2015 - Mar. 2026

- - Information Technology Center, Nagoya University, Japan
    - Professor

Apr. 2026 - Present

- - Global Research Institute for Mobility in Society, Institutes of Innovation for Future Society, Nagoya University, Japan

- - - Professor

Mar. 2001 - Mar. 2003

- - Advanced Telecommunications Research Institute International (ATR), Spoken Language Translation Research Laboratories (SLT), Japan
    - - Intern Researcher

Apr. 2003 - Sep. 2003

- - ATR-SLT, Japan
    - - Visiting Researcher

Oct. 2004 - Mar. 2006

- - ATR, Spoken Language Communication Research Laboratories (SLC), Japan
    - - Visiting Researcher

May 2006 - Present

- - National Institute of Information and Communications Technology (NICT), Knowledge Creating Communication Research Center, Japan

- - - Visiting Researcher

July 2014 - Aug. 2015

- - Organization for Management and Outside Collaboration on R&D, National Institute of Informatics (NII), Japan
    - - Visiting Associate Professor

Sep. 2015 - Mar. 2017

- - Graduate School of Information Science, NAIST, Japan
    - - Visiting Professor

Sep. 2015 - Mar. 2018

- - Organization for Management and Outside Collaboration on R&D, NII, Japan
    - - Visiting Professor

Dec. 2016 - Mar. 2020

- - Fundamental Information Technologies toward Innovative Social System Design, PRESTO, JST, Japan
    - - PRESTO Researcher

Oct. 2003 - Sep. 2004

- - Language Technologies Institute, Carnegie Mellon University, USA
    - - Visiting Researcher

Mar. 2008 - Aug. 2008

- - Department of Engineering, University of Cambridge, UK
    - - Visiting Researcher

Professional Volunteer Work

Jan. 2007 - Dec. 2009

- - IEEE SPS Speech and Language Technical Committee Member

Apr. 2010 - Dec. 2016

- - APSIPA Speech, Language, and Audio Technical Committee Member

Feb. 2011 - Jan. 2013

- - IEEE Signal Processing Society Kansai Chapter, Secretary

Mar. 2011 - Dec. 2013

- - ACM Transactions on Speech and Language Processing, Associate Editor

Feb. 2013 - Jan. 2015

- - IEEE Signal Processing Society Kansai Chapter, Treasurer

Apr. 2013 - Mar. 2024

- - EURASIP Journal on Audio, Speech, and Music Processing, Associate Editor

Jan. 2014 - Dec. 2016

- - IEEE SPS Speech and Language Technical Committee Member

Nov. 2016 - Dec. 2020

- - IEEE Signal Processing Letters, Associate Editor

Jan. 2019 - Jan. 2021

- - IEEE Signal Processing Society Tokyo Joint Chapter, Treasurer

June 2020 - June 2023

- - JASA Express Letters, Associate Editor

Dec. 2020 - Present

- - IEEE Signal Processing Letters, Senior Area Editor

Jan. 2025 - Present

- - APSIPA Speech and Language Processing Technical Committee, Chair

Apr. 2025 - Present

- - APSIPA Transactions on Signal and Information Processing, Senior Editor

Others

- - Guest Editorial Committee Members
    - IEEE Transactions on Audio, Speech and Language Processing, Special Issue on Voice Transformation, Guest Editor
    - IEICE Transactions on Information and Systems, Special Section on Recent Advances in Machine Learning for Spoken Language Processing, Guest Editor
    - IEICE Transactions on Information and Systems, Special Section on Advances in Modeling for Real-world Speech Information Processing and its Application, Guest Associate Editor
  - International Conference Committee Members
    - IEEE ICASSP 2012, Organizing Committee Member
    - INTERSPEECH 2010, Organizing Committee Member (Student Award)
    - INTERSPEECH 2014, Technical Program Committee Member, Coordinating Area Chair
    - IEEE 9th International Symposium on Wearable Computers (ISWC2005), Local Committee Member
    - APSIPA ASC 2009-2010, 2014-2015, Technical Program Committee Member
    - IEEE ASRU 2015, Organizing Committee Member, Regional Publicity Chair
    - IEEE ASRU 2017, Organizing Committee Member, Challenge Chair
    - The 7th ISCA Speech Synthesis Workshop (SSW7), Organizing Committee Member
    - The 5th ISCA Speech Synthesis Workshop (SSW5), Local Committee Member
    - International Workshop on Statistical Machine Learning for Speech Processing (IWSML), Organizing Committee Member, Local Chair
    - International Workshop on Machine Learning in Spoken Language Processing (MLSLP), Organizing Committee Member, Technical Program Chair
    - DSP in vehicles 2018, Organizing Committee Member, Program Chair
    - Speech Processing Courses in Crete (SPCC) 2019, 2020, Technical Committee Member
    - Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, Organizing Committee Member
  - Review Committee Members
    - IEEE ICASSP (2007-), INTERSPEECH (2006-)
    - EUSIPCO (2012, 2014-), APSIPA ASC (2009-)
    - ISCSLP (2014-), SLSP (2015-)
    - ISCA Speech Synthesis Workshop (SSW6-), DCASE (2017-)
    - IEEE ASRU (2011, 2017), IEEE SLT (2016-), IEEE WASPAA (2019-), IEEE MLSP (2019-), IEEE ICME (2021)
    - ISMIR (2019), AAAI (2021), IJCAI (2021-), NAACL-HLT (2007, 2016), COLING (2012, 2014)
    - Others
    - Several transactions
  - Session Chair
    - IEEE ICASSP 2007-2010, 2012, 2014, 2015, 2019-
    - INTERSPEECH 2006, 2009-2014, 2016-
    - EUSIPCO 2017, 2020
    - APSIPA ASC 2009, 2014, 2015, 2020, 2022-
    - IEEE SLT 2018, SSW 8th-9th, BC&VCC-WS 2020
    - Others
  - Other Activities
    - INTERSPEECH 2016, Voice Conversion Challenge 2016, Special Session Organizer
    - IEEE SLT 2018, Deep Leaning for Speech Synthesis, Special Session Organizer
    - INTERSPEECH 2022, VoiceMOS Challenge, Special Session Organizer
    - IEEE ASRU 2023, The Singing Voice Conversion Challenge 2023, Challenge Special Session Organizer
    - IEEE ASRU 2023, The VoiceMOS Challenge 2023, Challenge Special Session Organizer
    - IEEE SLT 2024, The VoiceMOS Challenge 2024, Challenge Special Session Organizer
    - IEEE SLT 2024, Singing Voice Deepfake Detection Challenge 2024, Challenge Special Session Organizer
    - IEEE ASRU 2025, The AudioMOS Challenge 2025, Challenge Special Session Organizer
    - APSIPA ASC 2025, Privacy and Security in Speech AI, Special Session Organizer
    - Voice Conversion Challenge 2016, 2018, 2020, Organizer
    - VoiceMOS Challenge 2022, 2023, 2024, Organizer
    - Singing Voice Conversion Challenge 2023, 2025, Organizer
    - Singing Voice Deepfake Detection Challenge 2024, Organizer
    - AudioMOS Challenge 2025, Organizer
    - APSIPA Distinguished Lecturer for 2019-2020

Research Grants

Apr. 2003 - Mar. 2005

- - JSPS, Grant-in-Aid for Scientific Research, Grant-in-Aid for JSPS Fellows

Apr. 2006 - Mar. 2009

- - MEXT, Grant-in-Aid for Scientific Research, Grant-in-Aid for Young Scientists (A)

May 2006 - Feb. 2007

- - IPA, Exploratory Software Project

Apr. 2008 - Mar. 2011

- - MIC, Strategic Information and Communications R&D Promotion Programme (SCOPE)

Apr. 2009 - Mar. 2011

- - JSPS, Japan-France Integrated Action Program (SAKURA)

Apr. 2010 - Mar. 2014

- - MEXT, Grant-in-Aid for Scientific Research, Grant-in-Aid for Young Scientists (A)

Dec. 2011 - July 2012

- - JST, Adaptable and Seamless Technology transfer Program through target-driven R&D (A-STEP), FS stage

Apr. 2014 - Mar. 2017

- - MEXT, Grant-in-Aid for Scientific Research, Grant-in-Aid for Scientific Research (B)

Apr. 2015 - Mar. 2019

- - MEXT, Grant-in-Aid for Scientific Research, Grant-in-Aid for challenging Exploratory Research

Dec. 2016 - Mar. 2020

- - JST, PRESTO, Fundamental Information Technologies toward Innovative Social System Design

Apr. 2017 - Mar. 2020

- - MEXT, Grant-in-Aid for Scientific Research, Grant-in-Aid for Scientific Research (B)

Oct. 2019 - Mar. 2025

- - JST, CREST, Creation and development of core technologies interfacing human and information environments

Apr. 2025 - Mar. 2026

- - JST AIP Acceleration Research, Development of techniques for augmenting speech-production skills through international challenge activities

Apr. 2026 - Present

- - MEXT, Grant-in-Aid for Scientific Research, Grant-in-Aid for Scientific Research (A)

Awards

- The 18th TELECOM System Technology Award for Student from the Telecommunications Advancement Foundation (TAF) in 2003
- The 23rd TELECOM System Technology Award from the TAF in 2008
- The 2007 Information and Systems Society (ISS) Best Paper Award from the Institute of Electronics, Information and Communication Engineers (IEICE) in 2008
- The 10th Ericsson Young Scientist Award from Nippon Ericsson K.K. in 2008
- The 4th Itakura Prize Innovative Young Researcher Award from the Acoustical Society of Japan (ASJ) in 2009
- The 26th Awaya Prize Young Researcher Award from the ASJ in 2009
- The 2009 Young Author Best Paper Award from the IEEE Signal Processing Society in 2010
- The 2010 ISS Young Researcher's Award in Speech Field from IEICE in 2011
- The Best Paper Award (Short Paper in Regular Session Category) from APSIPA ASC 2012 in 2012
- The 2012 Kiyasu Special Industrial Achievement Award from the IPSJ in 2013
- The 2013 Best Paper Award (Speech Communication Journal) from EURASIP-ISCA in 2013
- The Best Paper Award from APSIPA ASC 2014 in 2014
- The Best Paper Award of the 21st Annual Meeting of the ANLP in 2015
- The Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology, the Young Scientists' Prize in 2015
- Paper Award of 2017 Annual Conference of AXIES in 2018
- Poster Award of 2018 Annual Conference of AXIES in 2019
- The Best Paper Award from APSIPA ASC 2021 in 2021
- DCASE 2022 Challenge Task 2 Judges' Award in 2022
- The Special Award for Industry-Academia-Government Collaboration at the 35th Excellent New Technology and Product Award for Small and Medium-sized Enterprises in 2023
- IPSJ SIG Activity Contribution Award in 2023
- The Best Paper Award in SpandLDeteriorate Workshop of ACM MM Asia 2024 in 2024
- DCASE 2025 Challenge Task 2 Judges' Award in 2025

Memberships

- The Institute of Electrical and Electronics Engineers, Inc. (IEEE), Senior member
- Institute of Electronics, Information and Communication Engineers of Japan (IEICE), Member
- Information Processing Society of Japan (IPSJ), Member
- The Acoustical Society of Japan (ASJ), Member
- International Speech Communication Association (ISCA), Member
- The European Association for Signal Processing (EURASIP), Member
- The Asia-Pacific Signal and Information Processing Association (APSIPA), Member

Publications

Journal Papers

- 1. C.-H. Hu, Y. Yasuda, T. Toda. Investigation of preference-based speech quality assessment by integrating adaptive pair selection and pseudo-labeling," APSIPA Transactions on Signal and Information Processing. APSIPA Transactions on Signal and Information Processing, Vol. **, No. *, pp. ***-***, ***. 2026. (Accepted)
  2. J. Feng, Y. Yasuda, T. Toda. CTC score-based transcription quality annotation for stable text-to-speech synthesis training on noisy transcriptions. APSIPA Transactions on Signal and Information Processing, Vol. **, No. *, pp. ***-***, ***. 2026. (Accepted)
  3. D. Ma, J. Mi, F. Li, L.P. Violeta, J. He, W.-C. Huang, K. Kobayashi, T. Toda. Advancing electrolaryngeal speech enhancement through speech-text representation learning. IEEE Transactions on Biomedical Engineering, Vol. ** , pp. ***-***, *** 2026. (Accepted)
  4. R. Yoneyama, T. Toda. SiFi-GAN: combining source-filter modeling and upsampling-based high-fidelity neural vocoder for fast and pitch-controllable speech synthesis. IEICE Transactions on Information and Systems, Vol. E109-D, No. 6, pp. 945-956, June 2026.
  5. X. Shi, X. Li, T. Toda. Emotion similarity and shift: modeling temporal dynamic interactions for emotion prediction in conversation. IEEE Transactions on Audio, Speech and Language Processing, Vol. 34 , pp. 2552-2567, Apr. 2026.
  6. W.-C. Huang, E. Cooper, T. Toda. MOS-Bench: benchmarking generalization abilities of subjective speech quality assessment models. IEEE Transactions on Audio, Speech and Language Processing, Vol. 34, pp. 2385-2397, Apr. 2026. [Preprint]
  7. J. Feng Y. Yasuda, T. Toda. An investigation of the robustness of flow- and diffusion-based speech generation models on noisy transcriptions. APSIPA Transactions on Signal and Information Processing, Vol. 15, No. 1, pp. 270-292, Apr. 2026. [Paper]
  8. Y. Hashizume, T. Toda. Investigation of perceptual music similarity based on individual instrumental parts by large-scale listening test. APSIPA Transactions on Signal and Information Processing, Vol. 15, No. 1, pp. 249-269, Apr. 2026. [Paper]
  9. T. Komatsu, H. Munakata, Y. Ishikawa, K. Takeda, T. Toda. Semi-supervised text-audio contrastive learning method using pseudo-text input. APSIPA Transactions on Signal and Information Processing, Vol. 15, No. 1, pp. 183-198, Apr. 2026. [Paper]
  10. J. Mi, X. Shi, D. Ma, J. He, T. Fujimura, T. Toda. Robust speech emotion recognition under human speech noise. Computer Speech and Language, Vol. 100, Article 101987, pp. 1-16, Apr. 2026. [Paper]
  11. X. Shi, J. He, X. Li, T. Toda. A comprehensive study on the effectiveness of ASR representations for noise-robust speech emotion recognition. IEEE Transactions on Audio, Speech and Language Processing, Vol. 34 , pp. 707-722, Jan. 2026. [Preprint]
  12. B.M. Halpern, W.-C. Huang, L.P. Violeta, T. Toda. Severity-controllable pathological text-to-speech synthesis for clinical applications. IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 34, pp. 573-582, Jan. 2026. [Paper]
  13. H. Yamashita, T. Okamoto, R. Takashima, Y. Ohtani, T. Takiguchi, T. Toda, H. Kawai. Sequence-to-sequence voice conversion with weighted guided attention. IEEE Access, Vol. 13, pp. 216583-216595, Dec. 2025. [Paper]
  14. B.M. Halpern, T.B. Tienkamp, T. Rebernik, R.J.J.H. van Son, S.A.H.J. de Visscher, M.J.H. Witjes, D. Abur, T. Toda. XPPG-PCA: reference-free automatic speech severity evaluation with principal components. IEEE Journal of Selected Topics in signal Processing, Vol. 19, No. 5, pp. 783-795, Oct. 2025. [Preprint]
  15. L.P. Violeta, W.-C. Huang, D. Ma, R. Yamamoto, K. Kobayashi, T. Toda. Resolving domain mismatches in electrolaryngeal speech enhancement with linguistic intermediates. IEEE Journal of Selected Topics in signal Processing, Vol. 19, No. 5, pp. 827-839, June 2025. [Paper]
  16. T. Komatsu, K. Takeda, T. Toda. Audio difference learning framework for audio captioning. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e34, pp. 1-18, Nov. 2025. [Paper]
  17. R. Yoneyama, A. Miyashita, R. Yamamoto, T. Toda. Wavehax: aliasing-free neural waveform synthesis based on 2D convolution and harmonic prior for reliable complex spectrogram estimation. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33 , pp. 4454-4470, Oct. 2025. [Paper]
  18. T. Imamura, Y. Hashizume, W.-C. Huang, T. Toda. Music similarity representation learning focusing on individual instruments with source separation and human preference. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 4, e305, pp. 1-29, Oct. 2025. [Paper]
  19. N. Nishio, K. Kobayashi, T. Toda. Voice restoration for laryngectomized patients using a voice conversion system: “Save the Voice” project. Nihon Kikan Shokudoka Gakkai Kaiho (Japanese Edition), Vol. 76, No. 5, pp. 255-263, Oct. 2025.
  20. J. He, X. Shi, C.-H. Hu, J. Mi, X. Li, T. Toda. M4SER: multimodal, multirepresentation, multitask, and multistrategy learning for speech emotion recognition. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 4055-4070, Sep. 2025. [Paper]
  21. D. Yoshioka, Y. Nakata, Y. Yasuda, T. Toda. Text- and speech-style control for lecture speech generation focusing on disfluency. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e26, pp. 1-31, Sep. 2025. [Paper]
  22. Y. Yasuda, T. Toda. Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment. Computer Speech and Language, Vol. 96, Article 101888, pp. 1-16, Sep. 2025. [Paper]
  23. S. Chen, T. Toda. QHARMA-GAN: quasi-harmonic neural vocoder based on autoregressive moving average model. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 3703-3719, Sep. 2025. [Paper]
  24. D. Ma, L.P. Violeta, K. Kobayashi, T. Toda. Pretraining and fine-tuning techniques for electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 3189-3201, July 2025. [Paper]
  25. Y. Hashizume, L. Li, A. Miyashita, T. Toda. Learning separated representations for instrument-based music similarity. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e16, pp. 1-32, July 2025. [Paper]
  26. T. Ogura, T. Okamoto, Y. Ohtani, E. Cooper, T. Toda, H. Kawai. Phoneme-level duration controllable neural text-to-speech with phoneme embedding skip connection and modified Gaussian duration modeling. IEEE Access, Vol. 13, pp. 118369-118380, July 2025. [Paper]
  27. Y. Choi, C. Xie, T. Toda. Noise and reverberation-controllable voice conversion. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 2430-2443, June 2025. [Paper]
  28. J. He, T. Toda. PMF-CEC: phoneme-augmented multimodal fusion for context-aware ASR error correction with error-specific selective decoding. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 2402-2417, June 2025. [Paper]
  29. I. Kuroyanagi, T. Fujimura, K. Takeda, T. Toda. Improving anomalous sound detection through pseudo-anomalous set selection and pseudo-label utilization under unlabeled conditions. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e13, pp. 1-28, June 2025. [Paper]
  30. T. Fujimura, T. Toda. Analysis and extension of noisy-target training for unsupervised target signal enhancement. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e12, pp. 1-27, June 2025. [Paper]
  31. C. Xie, T. Toda. An investigation of noisy-to-noisy voice conversion performance in various noisy conditions. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e10, pp. 1-30, June 2025. [Paper]
  32. D. Ma, Y. Choi, T. Fujimura, F. Li, C. Xie, K. Kobayashi, T. Toda. Sequence-to-sequence voice conversion-based techniques for electrolaryngeal speech enhancement in noisy and reverberant conditions. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e8, pp. 1-40, May 2025. [Paper]
  33. Y. Ohtani, T. Okamoto, T. Toda, H. Kawai. Fast neural vocoder with fundamental frequency control using finite impulse response filters. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 1893-1906, Apr. 2025. [Paper]
  34. M. Eshghi, T. Toda. Predicting fundamental frequency patterns in electrolaryngeal speech using automated phoneme extraction. IEEE Access, Vol. 13, pp. 73831-73847, Apr. 2025. [Paper]
  35. S. Luan, Y. Wakabayashi, T. Toda. Generalized sound field interpolation for freely spaced microphone arrays in rotation-robust beamforming. Applied Acoustics, Vol. 236, Article 110706, pp. 1-15, Apr. 2025. [Paper]
  36. C.-H. Hu, Y. Yasuda, T. Toda. E2EPref: an end-to-end preference-based framework for speech quality assessment to alleviate bias in direct assessment scores. Computer Speech and Language, Vol. 93, Article 101799, pp. 1-17, Mar. 2025. [Paper]
  37. F. Li, F. Shen, D. Ma, J. Zhou, L. Wang, F. Fan, T. Liu, X. Chen, T. Toda, H. Niu. Mandarin speech reconstruction from surface electromyography based on generative adversarial networks. Medicine in Novel Technology and Devices, Vol. 26, Article 100359, pp. 1-7, Mar 2025. [Paper]
  38. S. Chen, T. Toda. Sequence-wise speech waveform modeling via gradient descent optimization of quasi-harmonic parameters. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 319-332, Jan. 2025. [Paper]
  39. D. Yoshioka, Y. Yasuda, T. Toda. Nonparallel spoken-text-style transfer for linguistic expression control in speech generation. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 333-346, Jan. 2025. [Paper]
  40. R. Wang, T. Fujimura, T. Toda. Target speaker extraction under noisy underdetermined conditions using conditional variational autoencoder, global style token, and neural postfilter. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e2, pp. 1-26, Jan. 2025. [Paper]
  41. I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Serial-OE: Anomalous sound detection based on serial method with outlier exposure capable of using small amounts of anomalous data for training. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e1, pp. 1-32, Jan. 2025. [Paper]
  42. F. Li, F. Shen, D. Ma, J. Zhou, S. Zhang, L. Wang, F. Fan, T. Liu, X. Chen, T. Toda, H. Niu. End-to-end Mandarin speech reconstruction based on ultrasound tongue images using deep learning . IEEE Transactions on Neural Systems and Rehabilitation Engineering , Vol. 33, pp. 140-149, Dec. 2024. [Paper]
  43. S. Luan, Y. Wakabayashi, T. Toda. Unequally spaced sound field interpolation for rotation-robust beamforming. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 3185-3199, June 2024. [Paper]
  44. L.P. Violeta, D. Ma, W.-C. Huang, T. Toda. Pretraining and adaptation techniques for electrolaryngeal speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 2777-2789, May 2024. [Paper]
  45. M. Eshghi, T. Toda. An investigation of fundamental frequency pattern prediction for Japanese eelectrolaryngeal speech enhancement based on frame-wise phoneme representations. IEEE Access, Vol. 12, pp. 50137-50153, Apr. 2024. [Paper]
  46. R. Wang, L. Li, T. Toda. Dual-channel target speaker extraction based on conditional variational autoencoder and directional information. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 1968-1979, Mar. 2024. [Paper]
  47. H. Yamashita, T. Okamoto, R. Takashima, Y. Ohtani, T. Takiguchi, T. Toda, H. Kawai. Fast neural speech waveform generative models with fully-connected layer-based upsampling. IEEE Access, Vol. 12, pp. 31409-31421, Feb. 2024. [Paper]
  48. C. Xie, T. Toda. Noisy-to-noisy voice conversion under variations of noisy condition. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 3871-3882, Oct. 2023. [Paper]
  49. R. Yoneyama, Y.-C. Wu, T. Toda. High-fidelity and pitch-controllable neural vocoder based on unified source-filter networks. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 3717-3729, Oct. 2023. [Paper]
  50. K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, H. Kawai. Harmonic-Net: fundamental frequency and speech rate controllable fast neural vocoder. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 1902-1915, May 2023. [Paper]
  51. W.-C. Huang, S.-W. Yang, T. Hayashi, T. Toda. A comparative study of self-supervised speech representation based voice conversion. IEEE Journal of Selected Topics in signal Processing, Vol. 16, No. 6, pp. 1308-1318, Oct. 2022. [Preprint]
  52. Y. Yasuda, T. Toda. Investigation of Japanese Png BERT language model in text-to-speech synthesis for pitch accent language. IEEE Journal of Selected Topics in signal Processing, Vol. 16, No. 6, pp. 1319-1328, Oct. 2022. [Paper]
  53. Y.-C. Wu, P.L. Tobing, K. Yasuhara, N. Matsunaga, Y. Ohtani, T. Toda, Y. Shiga, H. Kawai. A cyclical approach to synthetic and natural speech mismatch refinement of neural post-filter for low-cost text-to-speech system. APSIPA Transactions on Signal and Information Processing, Vol. 11, e30, pp. 1-32, Sep. 2022. [Paper]
  54. T. Okamoto, K. Matsubara, T. Toda, Y. Shiga, H. Kawai. Neural speech-rate conversion with multispeaker WaveNet vocoder. Speech Communication, Vol. 138, pp. 1-12, Mar. 2022. [Paper]
  55. K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, Y. Shiga, H. Kawai. Full-band LPCNet: a real-time neural vocoder for 48 kHz audio with a CPU. IEEE Access, Vol. 9, pp. 94923-94933, July 2021. [Paper]
  56. A. Ando, T. Mori, S. Kobashikawa, T. Toda. Speech emotion recognition based on listener-dependent emotion perception models. APSIPA Transactions on Signal and Information Processing, Vol. 10, e6, pp. 1-11, Apr. 2021. [Paper]
  57. Y.-C. Wu, T. Hayashi, P.L. Tobing, K. Kobayashi, T. Toda. Quasi-periodic WaveNet: an autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 1134-1148, Mar. 2021. [Paper]
  58. Y.-C. Wu, T. Hayashi, T. Okamoto, H. Kawai, T. Toda. Quasi-periodic parallel WaveGAN: a non-autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 792-806, Feb. 2021. [Paper]
  59. W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, T. Toda. Pretraining techniques for sequence-to-sequence voice conversion. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 745-755, Feb. 2021. <IEEE Signal Processing Society Japan Student Best Paper Award (recipient: Wen-Chin Huang)> [Paper]
  60. H. Kameoka, W.-C. Huang, K. Tanaka, T. Kaneko, N. Hojo, T. Toda. Many-to-many voice transformer network. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 656-670, Jan. 2021. [Paper]
  61. P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda. An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder. APSIPA Transactions on Signal and Information Processing, Vol. 9, e26, pp. 1-14, Nov. 2020. [Paper]
  62. X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V. Vestman, T. Kinnunen, K.A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. Wang, S. Le Maguer, M. Becker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Onuma, K. Mushika, T. Kaneda, Y. Jiang, L.-J. Liu, Y.-C. Wu, W.-C. Huang, T. Toda, K. Tanaka, H. Kameoka, I. Steiner, D. Matrouf, J.-F. Bonastre, A. Govender, S. Ronanki, J.-X. Zhang, Z.-H. Ling. ASVspoof 2019: a large-scale public database of synthetic, converted and replayed speech. Computer Speech and Language, Vol. 64, Article 101114, 25 pages, Nov. 2020. [Paper]
  63. Y.-C. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda. Non-parallel voice conversion system with WaveNet vocoder and collapsed speech suppression. IEEE Access, Vol. 8, No. 1, pp. 62094-62106, Apr. 2020. [Paper]
  64. S. Ohira, S. Seiya, R. Ito, K. Okamoto, U. Tanikawa, D. Deguchi, T. Toda. Development and Evaluation of "KamiRepo" Web Service with Return of Handwritten Assignments via LMS. IPSJ Transactions on Computers and Education (Japanese Edition), Vol. 6, No. 1, pp. 52-68, Feb. 2020. [Paper]
  65. A. Ando, R. Masumura, H. Kamiyama, S. Kobashikawa, Y. Aono, T. Toda. Customer satisfaction estimation in contact center calls based on a hierarchical multi-task model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 28, No. 1, pp. 715-728, Jan. 2020. [Link]
  66. P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda. Voice conversion with CycleRNN-based spectral mapping and finly tuned WaveNet vocoder. IEEE Access, Vol. 7, No. 1, pp. 171114-171125, Dec. 2019. [Paper]
  67. S. Seki, H. Kameoka, L. Li, T. Toda, K. Takeda. Underdetermined source separation based on generalized multichannel variational autoencoder. IEEE Access, Vol. 7, No. 1, pp. 168104-168115, Nov. 2019. [Paper]
  68. A. Tamamori, T. Hayashi, T. Toda, K. Takeda. Daily activity recognition based on recurrent neural network using multi-modal signals. APSIPA Transactions on Signal and Information Processing, Vol. 7, e21, pp. 1-11, Dec. 2018. [Paper]
  69. T. Kano, S. Takamichi, S. Sakti, G. Neubig, T. Toda, S. Nakamura. An end-to-end model for cross-lingual transformation of paralinguistic information. Machine Translation, Vol. 32, No. 4, pp. 353-368, Dec. 2018. [Link]
  70. S. Seki, T. Toda, K. Takeda. Stereophonic music separation based on non-negative tensor factorization with cepstral distance regularization. IEICE Transactions on Fundamentals, Vol. E101-A, No. 7, pp. 1057-1064, July 2018. [Link]
  71. K. Kobayashi, T. Toda, S. Nakamura. Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential. Speech Communication, Vol. 99, pp. 211-220, May 2018. [Paper]
  72. T. Hayashi, M. Nishida, N. Kitaoka, T. Toda, K. Takeda. Daily activity recognition with large-scaled real-life recording datasets based on deep neural network using multi-modal signals. IEICE Transactions on Fundamentals, Vol. E101-A, No. 1, pp. 199-210, Jan. 2018. [Link]
  73. P.L. Tobing, K. Kobayashi, T. Toda. Articulatory controllable speech modification based on statistical inversion and production mappings. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 12, pp. 2337-2350, Dec. 2017. [Paper]
  74. T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda. Duration-controlled LSTM for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 11, pp. 2059-2070, Nov. 2017. <IEEE Signal Processing Society Japan Young Author Best Paper Award (recipient: Tomoki Hayashi)> [Paper]
  75. K. Tanaka, T. Toda, S. Nakamura. A vibration control method of an electrolarynx based on statistical F0 pattern prediction. IEICE Transactions on Information and Systems, Vol. E100-D, No. 9, pp. 2165-2173, Sep. 2017. [Paper]
  76. Q. Truong Do, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Preserving word-level emphasis in speech-to-speech translation. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 25, No. 3, pp. 544-556, Mar. 2017. <IEEE Signal Processing Society Japan Student Best Paper Award (recipient: Quoc Truong Do)> [Link]
  77. A. Miura, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Improving pivot translation by remembering the pivot. Journal of Natural Language Processing (Japanese Edition), Vol. 23, No. 5, pp. 499-528, Dec. 2016. [Paper]
  78. Y. Oshima, S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Non-native text-to-speech preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. IEICE Transactions on Information and Systems, Vol. E99-D, No. 12, pp. 3132-3139, Dec. 2016. [Paper]
  79. K. Kobayashi, T. Toda, T. Nakano, M. Goto, S. Nakamura. Improvements of voice timbre control based on perceived age in singing voice conversion. IEICE Transactions on Information and Systems, Vol. E99-D, No. 11, pp. 2767-2777, Nov. 2016. [Paper]
  80. T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Learning cooperative persuasive dialogue policies using framing. Speech Communication, Vol. 84, pp. 83-96, Nov. 2016. [Link]
  81. S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A statistical sample-based approach to GMM-based voice conversion using tied-covariance acoustic models. IEICE Transactions on Information and Systems, Vol. E99-D, No. 10, pp. 2490-2498, Oct. 2016. [Paper]
  82. H. Tanaka, S. Sakti, G. Neubig, T. Toda, H. Negoro, H. Iwasaka, S. Nakamura. Teaching social communication skills through human-agent interaction.. ACM Transactions on Interactive Intelligent Systems, Vol. 6, No. 2, 23 pages, Aug. 2016. [Link]
  83. H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Enhancing event-related potentials based on maximum a posteriori estimation with a spatial correlation prior. IEICE Transactions on Information and Systems, Vol. E99-D, No. 6, pp. 1410-1419, June 2016. [Paper]
  84. S. Takamichi, T. Toda, A.W. Black, G. Neubig, S. Sakti, S. Nakamura. Post-filters to modify the modulation spectrum for statistical parametric speech synthesis. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No. 4, pp. 755-767, Apr. 2016. <IEEE Signal Processing Society Japan Young Author Best Paper Award (recipient: Shinnosuke Takamichi)> [Paper]
  85. Z. Wu, P. De Leon, C. Demiroglu, A. Khodabakhsh, S. King, Z.-H. Ling, D. Saito, B. Stewart, T. Toda, M. Wester, J. Yamagishi. Anti-spoofing for text-independent speaker verification: an initial database, comparison of countermeasures, and human performance. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No. 4, pp. 768-783, Apr. 2016. [Link]
  86. K. Akabe, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Error selection methods for machine translation error analysis. Journal of Natural Language Processing (Japanese Edition), Vol. 23, No. 1, pp. 88-117, Jan. 2016. [Paper]
  87. M. Mizukami, L. Nio, H. Kimura, T. Nomura, G. Neubig, K. Yoshino, S. Sakti, T. Toda, S. Nakamura. Example based dialogue system based on satisfaction prediction. Transactions of the Japanese Society for Artificial Intelligence (Japanese Edition), Vol. 31, No. 1, 12 pages, Jan. 2016. [Paper]
  88. P. Arthur, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Semantic parsing of ambiguous input through paraphrasing and verification. Transactions of the Association for Computational Linguistics, Vol. 3, pp. 571-584, Dec. 2015. [Link]
  89. H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. NOCOA+: multimodal computer-based training for social and communication skills. IEICE Transactions on Information and Systems, Vol. E98-D, No. 8, pp. 1536-1544, Aug. 2015. [Paper]
  90. K. Kobayashi, T. Toda, H. Doi, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. Voice timbre control based on perceived age in singing voice conversion. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1419-1428, June 2014. [Paper]
  91. K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A hybrid approach to electrolaryngeal speech enhancement based on noise reduction and statistical excitation generation. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1429-1437, June 2014. [Paper]
  92. K. Kubo, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Structured adaptive regularization of weight vectors for a robust grapheme-to-phoneme conversion model. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1468-1476, June 2014. [Paper]
  93. L. Nio, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Utilizing human-to-human conversation examples for a multi domain chat-oriented dialog system. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1497-1505, June 2014. [Paper]
  94. S. Takamichi, T. Toda, Y. Shiga, S. Sakti, G. Neubig, S. Nakamura. Parameter generation methods with rich context models for high-quality and flexible text-to-speech synthesis. IEEE Journal of Selected Topics in Signal Processing, Vol. 8, No. 2, pp. 239-250, Apr. 2014. <The 30th TELECOM System Technology Award for Student from TAF (recipient: Shinnosuke Takamichi)> <IEEE Kansai Section Student Paper Award (recipient: Shinnosuke Takamichi)> [Link]
  95. H. Doi, T. Toda, K. Nakamura, H. Saruwatari, K. Shikano. Alaryngeal speech enhancement based on one-to-many eigenvoice conversion. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 22, No. 1, pp. 172-183, Jan. 2014. [Paper]
  96. Y. Yamauchi, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Answer sentence generation using relationships between terms for guiding users to new topics in dialog systems. Transactions of the Japanese Society for Artificial Intelligence (Japanese Edition), Vol. 29, No. 1, pp. 80-89, Jan. 2014. [Paper]
  97. T. Toda, M. Nakagiri, K. Shikano. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, Vol. 20, No. 9, pp. 2505-2517, Sep. 2012. [Paper]
  98. T. Nakamura, K. Sugiura, T. Nagai, N. Iwahashi, T. Toda, H. Okada, T. Omori. Learning novel objects for extended mobile manipulation. Journal of Intelligent and Robotic Systems, Vol. 66, No. 1-2, pp. 187-204, Apr. 2012. [Link]
  99. T. Nakamura, M. Attamimi, K. Sugiura, T. Nagai, N. Iwahashi, T. Toda, H. Okada, T. Omori. An extended mobile manipulation robot learning novel objects. Journal of the Robotics Society of Japan, Vol. 30, No. 2, pp. 213-224, Mar. 2012. [Paper]
  100. T. Kubo, T. Toda, M. Yoshida, T. Hattori, K. Ikeda. Vowel recognition based on surface electromyography with electrode grid on submental region. Transactions of Japanese Society for Medical and Biological Engineering, Vol. 50, No. 1, pp. 38-46, Feb. 2012. [Paper]
  101. K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Communication, Vol. 54, No. 1, pp. 134-146, Jan. 2012. [Link]
  102. H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Esophageal speech enhancement based on statistical voice conversion with Gaussian mixture models. IEICE Transactions on Information and Systems, Vol. E93-D, No. 9, pp. 2472-2482, Sep. 2010. [Paper]
  103. Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Improvements of the one-to-many eigenvoice conversion system. IEICE Transactions on Information and Systems, Vol. E93-D, No. 9, pp. 2491-2499, Sep. 2010. [Paper]
  104. K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Evaluation of extremely small sound source signals used in speaking-aid system with statistical voice conversion. IEICE Transactions on Information and Systems, Vol. E93-D, No. 7, pp. 1909-1917, July 2010. [Paper]
  105. Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Adaptive training for voice conversion based on eigenvoices. IEICE Transactions on Information and Systems, Vol. E93-D, No. 6, pp. 1589-1598, June 2010. [Paper]
  106. T. Hirahara, M. Otani, S. Shimizu, T. Toda, K. Nakamura, Y. Nakajima, K. Shikano. Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech Communication, Vol. 52, No. 4, pp. 301-313, Apr. 2010. [Link]
  107. V.-A. Tran, G. Bailly, H. Loevenbruck, T. Toda. Improvement to a NAM-captured whisper-to-speech system. Speech Communication, Vol. 52, No. 4, pp. 314-326, Apr. 2010. [Link]
  108. J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, S. Renals. Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Transactions on Audio, Speech and Language Processing, Vol. 17, No. 6, pp. 1208-1230, Aug. 2009. [Link]
  109. R. Gomez, T. Toda, H. Saruwatari, K. Shikano. Techniques in rapid unsupervised speaker adaptation based on HMM-sufficient statistics. Speech Communication, Vol. 51, No. 1, pp. 42-57, Jan. 2009. [Link]
  110. H. Zen, T. Toda, K. Tokuda. The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. IEICE Transactions on Information and Systems, Vol. E91-D, No. 6, pp. 1764-1773, June 2008. [Paper]
  111. Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Maximum likelihood voice conversion based on Gaussian mixture model with STRAIGHT mixed excitation. IEICE Transactions on Information and Systems (Japanese Edition), Vol. J91-D, No. 4, pp. 1082-1091, Apr. 2008. [Paper]
  112. T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Cost reduction of acoustic modeling for real-environment applications using unsupervised and selective training. IEICE Transactions on Information and Systems, Vol. E91-D, No. 3, pp. 499-507, Mar. 2008. [Paper]
  113. G. Nagino, M. Shozakai, T. Toda, H. Saruwatari, K. Shikano. Building an effective speech corpus by utilizing statistical multidimensional scaling method. IEICE Transactions on Information and Systems, Vol. E91-D, No. 3, pp. 607-614, Mar. 2008. [Paper]
  114. T. Toda, A.W. Black, K. Tokuda. Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Communication, Vol. 50, No. 3, pp. 215-227, Mar. 2008. <The 2013 Best Paper Award (Speech Communication Journal) from EURASIP-ISCA> [Paper]
  115. T. Toda, A.W. Black, K. Tokuda. Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 8, pp. 2222-2235, Nov. 2007. <The 2009 Young Author Best Paper Award from the IEEE Signal Processing Society> [Paper]
  116. T. Toda, K. Tokuda. A Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions on Information and Systems, Vol. E90-D, No. 5, pp. 816-824, May 2007. <The 23rd TELECOM System Technology Award from the TAF> <The 2007 ISS Best Paper Award from the IEICE> [Paper]
  117. K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. A speaking communication aid system for total laryngectomees using voice conversin of body transmitted artificial speech. IEICE Transactions on Information and Systems (Japanese Edition), Vol. J90-D, No. 3, pp. 780-787, Mar. 2007. [Paper]
  118. R. Gomez, T. Toda, H. Saruwatari, K. Shikano. Reducing computation time of the rapid unsupervised speaker adaptation based on HMM-sufficient statistics. IEICE Transactions on Information and Systems, Vol. E90-D, No. 2, pp. 554-561, Feb. 2007. [Paper]
  119. H. Zen, T. Toda, M. Nakamura, K. Tokuda. Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Transactions on Information and Systems, Vol. E90-D, No. 1, pp. 325-333, Jan. 2007. <The 23rd TELECOM System Technology Award from the TAF> <The 2007 ISS Best Paper Award from the IEICE> [Paper]
  120. H. Kawai, T. Toda, J. Yamagishi, T. Hirai, J. Ni, N. Nishizawa, M. Tsuzaki, K. Tokuda. XIMERA: a concatenative speech synthesis system with large scale corpora. IEICE Transactions on Information and Systems (Japanese Edition), Vol, J89-D-II, No. 12, pp. 2688-2698, Dec. 2006. [Paper]
  121. T. Hirai, H. Kawai, M. Tsuzaki, T. Toda. Analysis of naturalness degradation factors in speech synthesis system XIMERA for Japanese. Journal of the Acoustical Society of Japan in Japanese, Vol. 62, No. 11, pp. 767-773, Nov. 2006. [Paper]
  122. T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Utterance-based selective training for the automatic creation of task-dependent acoustic models. IEICE Transactions on Information and Systems, Vol. E89-D, No. 3, pp. 962-969, Mar. 2006. [Paper]
  123. R. Gomez, A. Lee, T. Toda, H. Saruwatari, K. Shikano. Improving rapid unsupervised speaker adaptation based on HMM-sufficient statistics in noisy environments using multi-template models. IEICE Transactions on Information and Systems, Vol. E89-D, No. 3, pp. 998-1005, Mar. 2006. [Paper]
  124. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis. Speech Communication, Vol. 48, No. 1, pp. 45-56, Jan. 2006. [Paper]
  125. K. Adachi, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano. Designing target cost function based on prosody of speech database. IEICE Transactions on Information and Systems, Vol. E88-D, No. 3, pp. 519-524, Mar. 2005. [Paper]
  126. T. Masuda, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano. Speech databases with various prosody and its evaluation on speech rate. IEICE Transactions on Information and Systems (Japanese Edition), Vol. J87-D-II, No. 2, pp. 447-455, Feb. 2004. [Paper]
  127. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. A segment selection algorithm for Japanese concatenative speech synthesis based on both phoneme unit and diphone unit. IEICE Transactions on Information and Systems (Japanese Edition), Vol. J85-D-II, No. 12, pp. 1760-1770, Dec. 2002. [Paper]
  128. M. Mashimo, T. Toda, H. Kawanami. K. Shikano, N. Campbell. Cross-language voice conversion evaluation using bilingual databases. IPSJ Journal, Vol. 43, No. 7, pp. 2177-2185, July 2002. [Paper]
  129. T. Toda, J. Lu, H. Saruwatari, K. Shikano. Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping. IEICE Transactions on Information and Systems (Japanese Edition), Vol. J84-D-II, No. 10, pp. 2181-2189, Oct. 2001. <The 18th TELECOM System Technology Award for Student from TAF> [Paper]
  130. T. Toda, H. Banno, S. Kajita, K. Takeda, F. Itakura, K. Shikano. Improvement of STRAIGHT method under noisy conditions based on lateral inhibitive weighting. IEICE Transactions on Information and Systems (Japanese Edition), Vol. J83-D-II, No. 11, pp. 2180-2189, Nov. 2000. [Paper]

Letters

- 1. N. Nishio, K. Kobayashi, D. Ma, S. Mitani, M. Sone, T. Toda. A voice conversion system from electrolarynx speech to preoperative patient’s speech for total laryngectomy. OTO Open, Vol. 10, No. 1, Scientific Briefing, 5 pages, Feb. 2026.
  2. W.-C. Huang, Y.-C. Wu, T. Toda. Multi-speaker text-to-speech training with speaker anonymized data. IEEE Signal Processing Letters, Vol. 31, pp. 2995-2999, Oct. 2024. [Letter]
  3. K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, H. Kawai. Comparison of real-time multi-speaker neural vocoders on CPUs . Acoustical Science and Technology, Acoustical Letter, Vol. 43, No. 2, pp. 121-124, Mar. 2022. [Letter]
  4. K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, Y. Shiga, H. Kawai. Investigation of training data size for real-time neural vocoders on CPUs. Acoustical Science and Technology, Acoustical Letter, Vol. 42, No. 1, pp. 65-68, Jan. 2021. [Letter]
  5. T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, H. Kawai. Deep neural network-based power spectrum reconstruction to improve quality of vocoded speech with limited acoustic parameters. Acoustical Science and Technology, Acoustical Letter, Vol. 39, No. 2, pp. 163-166, Mar. 2018. [Letter]
  6. H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. NOCOA: A Computer-Based Training Tool for Social and Communication Skills That Exploits Non-verbal Behaviors. The Journal of Information and Systems in Education (Short Note), Vol. 12, No. 1, pp. 19-26, Apr. 2013.

International Conferences

- 1. T. Imamura, Y. Hashizume, W.-C. Huang, T. Toda. Incorporating singnal processing-based knowledge for music similarity representation learning based on individual instrument sounds. Proc. EUSIPCO, pp. ***-***, Bruges, Belgium, Sep. 2026. (Accepted)
  2. K. Ogita, R. Yoneyama, W.-C. Huang, T. Toda. Evaluating VAE-SiFiGAN under large-scale training and noisy conditions with data selection using F0 extraction error estimation. Proc. EUSIPCO, pp. ***-***, Bruges, Belgium, Sep. 2026. (Accepted)
  3. T. Imamura, T. Komatsu, H. Munakata, T. Toda. Audio-visual feature fusion for re-scoring relevance scores of video moment retrieval. Proc. IEEE ICASSP, pp. 5551-5555, Barcelona, Spain, May 2026.
  4. L.P. Violeta, X. Zhang, J. Shi, Y. Yasuda, W.-C. Huang, Z. Wu, T. Toda. The singing voice conversion challenge 2025: from singer identity conversion to singing style conversion. Proc. IEEE ICASSP, pp. 17707-17711, Barcelona, Spain, May 2026.
  5. J. Wang, T. Toda. From fixed positions to free-form signals: Virtual Microphone signal estimation for general-purpose spatial audio processing. Proc. IEEE ICASSP, pp. 21011-21015, Barcelona, Spain, May 2026.
  6. J. He, N. Sawada, K. Miyazaki, T. Toda. PARCO: phoneme-augmented robust contextual ASR via contrastive entity disambiguation. Proc. IEEE ASRU, 7 pages, Honolulu, USA, Dec. 2025.
  7. E. Cooper, T. Okamoto, Y. Ohtani, T. Toda, H. Kawai. Layer-wise analysis for quality of multilingual synthesized speech. Proc. IEEE ASRU, 7 pages, Honolulu, USA, Dec. 2025.
  8. W.-C. Huang, H. Wang, C. Liu, Y.-C. Wu, A. Tjandra, W.-N. Hsu, E. Cooper, Y. Qin, T. Toda. The AudioMOS Challenge 2025. Proc. IEEE ASRU, 8 pages, Challenge Paper, Honolulu, USA, Dec. 2025.
  9. Y. Ohtani, T. Okamoto, T. Toda, H. Kawai. Voice factor control using FIR-based fast neural vocoder for speech generation applications . Proc. IEEE ASRU, 4 pages, Demo Paper, Honolulu, USA, Dec. 2025.
  10. K. Mizukami, D. Deguchi, T. Toda, H. Murase, H. Kyutoku, T. Minematsu. Study on automatic generation of lecture videos based on content analysis of lecture slides. Proc. CELDA, 4 pages, Porto, Portugal, Nov. 2025.
  11. K. Wilkinghoff, T. Fujimura, K. Imoto, J. Le Roux, Z.-H. Tan, T. Toda. Handling domain shifts for anomalous sound detection: a review of DCASE-related work. Proc. DCASE Workshop, pp. 20-24, Barcelona, Spain, Oct. 2025.
  12. M. Matsumoto, T. Fujimura, W.-C. Huang, T. Toda. Adjusting bias in anomaly scores via variance minimization for domain-generalized discriminative anomalous sound detection. Proc. DCASE Workshop, pp. 25-29, Barcelona, Spain, Oct. 2025.
  13. T. Fujimura, K. Wilkinghoff, K. Imoto, T. Toda. ASDKit: a toolkit for comprehensive evaluation of anomalous sound detection methods. Proc. DCASE Workshop, pp. 40-44, Barcelona, Spain, Oct. 2025.
  14. T. Fujimura, I. Kuroyanagi, T. Toda. Discriminative anomalous sound detection using pseudo labels, target signal enhancement, and ensemble feature extractors. Proc. DCASE Workshop, pp. 180-184, Barcelona, Spain, Oct. 2025.
  15. K. Sawada, W.-C. Huang, T. Toda. Hierarchical symbolic music generation with variational autoencoder-based bar-wise feature sequences. Proc. APSIPA ASC, pp. 299-304, Singapore, Oct. 2025.
  16. S. Tang, Z. Liu, L. Chen, K. Lee, T. Toda, Z.-H. Ling. A preliminary study on sectional voice anonymization and detection. Proc. APSIPA ASC, pp. 318-323, Singapore, Oct. 2025.
  17. K. Hattori, W.-C. Huang, K. Takeda, T. Toda. An evaluation of supervised virtual microphone estimators in reverberant sound fields. Proc. APSIPA ASC, pp. 517-522, Singapore, Oct. 2025.
  18. M. Kaneko, W.-C. Huang, T. Toda. Estimating speaker's seating position from monaural speech in a simulated vehicle interior sound field. Proc. APSIPA ASC, pp. 625-629, Singapore, Oct. 2025.
  19. H. Miyaji, K. Sawada, W.-C. Huang, T. Toda. Designing a music difficulty measure for controllable automatic piano rearrangement. Proc. APSIPA ASC, pp. 834-839, Singapore, Oct. 2025.
  20. K. Niwa, K. Kobayashi, T. Toda. Investigation of the effectiveness of converted speech auditory feedback in low-latency real-time voice conversion. Proc. APSIPA ASC, pp. 905-910, Singapore, Oct. 2025.
  21. Y. Nakata, D. Yoshioka, W.-C. Huang, T. Toda. Disfluency disentanglement enhancement in spoken-text-style transfer for spontaneous speech synthesis. Proc. APSIPA ASC, pp. 2254-2259, Singapore, Oct. 2025.
  22. D. Yoon, T. Toda. Neural semi-fragile watermarking for proactive deepfake speech detection. Proc. APSIPA ASC, pp. 2396-2401, Singapore, Oct. 2025.
  23. L. Chen, K.A. Lee, Z.-H. Ling, X. Wang, R.K. Das, T. Toda, H. Li. Speaker privacy and security in the big data era: protection and defense against deepfake. Proc. APSIPA ASC, Perspective paper, pp. 2565-2570, Singapore, Oct. 2025.
  24. L.P. Violeta, W.-C. Huang, T. Toda. Serenade: a singing style conversion framework based on audio infilling. Proc. EUSIPCO, pp. 411-415, Palermo, Italy, Sep. 2025.
  25. K. Ogita, R. Yoneyama, W.-C. Huang, T. Toda. VAE-SiFiGAN: source-filter HiFi-GAN based on variational autoencoder representations with enhanced pitch controllability. Proc. EUSIPCO, pp. 531-535, Palermo, Italy, Sep. 2025.
  26. Y. Yasuda, J. Yamagishi, T. Toda. Continual subjective evaluation method of speech by merging sort-based preference tests towards ever-expanding corpus of human ratings. Proc. SSW, pp. 14-20, Leeuwarden, the Netherlands, Aug. 2025.
  27. T. Ogura, T. Okamoto, Y. Ohtani, E. Cooper, T. Toda, H. Kawai. GST-BERT-TTS: prosody prediction without accentual labels for multi-speaker TTS using BERT with global style tokens. Proc. INTERSPEECH, pp. 444-448, Rotterdam, the Netherlands, Aug. 2025.
  28. X. Shi, X, Li, T. Toda. Who, When, and What: leveraging the "Three Ws" concept for emotion recognition in conversation. Proc. INTERSPEECH, pp. 1763-1767, Rotterdam, the Netherlands, Aug. 2025.
  29. W.-C. Huang, E. Cooper, T. Toda. SHEET: a multi-purpose open-source speech human evaluation estimation toolkits. Proc. INTERSPEECH, pp. 2355-2359, Rotterdam, the Netherlands, Aug. 2025.
  30. J. He, N. Sawada, K. Miyazaki, T. Toda. CMT-LLM: context-aware multi-talker ASR utilizing large language models. Proc. INTERSPEECH, pp. 2575-2579, Rotterdam, the Netherlands, Aug. 2025.
  31. J. He, J. Mi, T. Toda. GIA-MIC: multimodal emotion recognition with gated interactive attention and modality-invariant learning constraints. Proc. INTERSPEECH, pp. 2695-2699, Rotterdam, the Netherlands, Aug. 2025.
  32. B. Halpern, T. Tienkamp, T. Rebernik, R. van Son, M. Wieling, D. Abur, T. Toda. Relationship between objective and subjective perceptual measures of speech in individuals with head and neck cancer. Proc. INTERSPEECH, pp. 3733-3737, Rotterdam, the Netherlands, Aug. 2025.
  33. X. Shi, X. Li, T. Toda. Speaker-aware multi-task learning for speech emotion recognition. Proc. INTERSPEECH, pp. 4333-4337, Rotterdam, the Netherlands, Aug. 2025.
  34. X. Shi, J. Mi, X. Li, T. Toda. Advancing emotion recognition via ensemble learning: integrating speech, context, and text representations. Proc. INTERSPEECH, pp. 4693-4697, Rotterdam, the Netherlands, Aug. 2025.
  35. R. Yoneyama, M. Kawamura, R. Terashima, R. Yamamoto, T. Toda. Comparative analysis of fast and high-fidelity neural vocoders for low-latency streaming synthesis in resource-constrained environments. Proc. INTERSPEECH, pp. 4888-4892, Rotterdam, the Netherlands, Aug. 2025.
  36. C.-H. Hu, Y. Yasuda, A. Yoshimoto, T. Toda. Unifying listener scoring scales: comparison learning framework for speech quality assessment and continuous speech emotion recognition. Proc. INTERSPEECH, pp. 5428-5432, Rotterdam, the Netherlands, Aug. 2025.
  37. M. Murata, K. Miyazaki, T. Koriyama, T. Toda. Eigenvoice synthesis based on model editing for speaker generation. Proc. INTERSPEECH, pp. 5523-5527, Rotterdam, the Netherlands, Aug. 2025.
  38. D. Ma, J. Mi, F. Li, L.P. Violeta, K. Kobayashi, T. Toda. Improving electrolaryngeal speech enhancement via a representation learning method based on integrated text and speech representations. Proc. IEEE EMBC, 6 pages, Copenhagen, Denmark, July 2025.【3rd Place Award in EMBC 2025 Student Paper Competition（受賞者：Ding Ma）】
  39. Y. Hashizume, T. Toda. Investigation of perceptual music similarity focusing on each instrumental part. Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
  40. T. Fujimura, I. Kuroyanagi, T. Toda. Improvements of discriminative feature space training for anomalous sound detection in unlabeled conditions. Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
  41. K. Nishizawa, R. Yamamoto, W.-C. Huang, T. Toda. Investigating factors related to the naturalness of synthesized unison singing. Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
  42. T. Ogura, T. Okamoto, Y. Ohtani, E. Cooper, T. Toda, H. Kawai. Mora-level prosody prediction for text-to-speech using Japanese BERT without accentual labels. Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
  43. X. Shi, Y. Gao, J. He, J. Mi, X. Li, T. Toda. A study on multimodal fusion and layer adapter in emotion recognition. Proc. APSIPA ASC, 6 pages, Macau, China, Dec. 2024. [Paper]
  44. T. Imamura, Y. Hashizume, T. Toda. Multi-task learning approaches for music similarity representation learning based on individual instrument sounds. Proc. APSIPA ASC, 6 pages, Macau, China, Dec. 2024. [Paper]
  45. Z. Yang, J. He, T. Toda. Multi-modal video summarization based on two-stage fusion of audio, visual, and recognized text information. Proc. APSIPA ASC, 6 pages, Macau, China, Dec. 2024. [Paper]
  46. J. Mi, S. Kim, T. Toda. Improved architecture for high-resolution piano transcription to efficiently capture acoustic characteristics of music signals. Proc. APSIPA ASC, 6 pages, Macau, China, Dec. 2024. [Paper]
  47. J. Mi, X. Shi, D. Ma, J. He, T. Fujimura, T. Toda. Two-stage framework for robust speech emotion recognition using target speaker extraction in human speech noise conditions. Proc. APSIPA ASC, 6 pages, Macau, China, Dec. 2024. [Paper]
  48. B. Halpern, T. Toda. Reference-free automatic speech severity evaluation using acoustic unit language modelling. Proc. SpandLDeteriorate Workshop of ACM Multimedia Asia (Workshop on Multi-Biological Sensing Data for Speech and Language Deterioration Prediction), 5 pages, Auckland, New Zealand, Dec. 2024. <Best Paper Award>
  49. Y. Zhang, Y. Zang, J. Shi, R. Yamamoto, T. Toda, Z. Duan. SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge. Proc. IEEE SLT, pp. 792-797, Macau, China, Dec. 2024.
  50. W.-C. Huang, S.-W. Fu, E. Cooper, R. Zezario, T. Toda, H.-M. Wang, J. Yamagishi, Y. Tsao. The VoiceMOS Challenge 2024: beyond speech quality prediction. Proc. IEEE SLT, pp. 813-820, Macau, China, Dec. 2024.
  51. T. Okamoto, Y. Ohtani, S. Shimizu, T. Toda, H. Kawai. Challenge of singing voice synthesis using only text-to-speech corpus with FIRNet Source-Filter Neural Vocoder. Proc. INTERSPEECH, pp. 1870-1874, Kos Island, Greece, Sep. 2024. [Paper]
  52. C.-H. Hu, Y. Yasuda, T. Toda. Embedding learning for preference-based speech quality assessment. Proc. INTERSPEECH, pp. 2685-2689, Kos Island, Greece, Sep. 2024. [Paper]
  53. B. Halpern, T. Tienkamp, W.-C. Huang, L.P. Violeta, T. Rebernik, S. de Visscher, M.J.H. Witjes, M. Wieling, D. Abur, T. Toda. Quantifying the effect of speech pathology on automatic and human speaker verification. Proc. INTERSPEECH, pp. 3015-3019, Kos Island, Greece, Sep. 2024. [Paper]
  54. X. Shi, X. LI, T. Toda. Multimodal fusion of music theory-inspired and self-supervised representations for improved emotion recognition. Proc. INTERSPEECH, pp. 3724-3728, Kos Island, Greece, Sep. 2024. [Paper]
  55. S. Chen, T. Toda. QHM-GAN: neural vocoder based on quasi-harmonic modeling. Proc. INTERSPEECH, pp. 3889-3893, Kos Island, Greece, Sep. 2024. [Paper]
  56. J. Feng, Y. Yasuda, T. Toda. Exploring the robustness of text-to-speech synthesis based on diffusion probabilistic models to heavily noisy transcriptions. Proc. INTERSPEECH, pp. 4408-4412, Kos Island, Greece, Sep. 2024. [Paper]
  57. Zang, J. Shi, Y. Zhang, R. Yamamoto, J. Han, Y. Tang, S. Xu, W. Zhao, J. Guo, T. Toda, Z. Duan CtrSVDD: a benchmark dataset and baseline analysis for controlled singing voice deepfake detection. Proc. INTERSPEECH, pp. 4783-4787, Kos Island, Greece, Sep. 2024. [Paper]
  58. J. He, T. Toda. 2DP-2MRC: 2-dimensional pointer-based machine reading comprehension method for multimodal moment retrieval. Proc. INTERSPEECH, pp. 5073-5077, Kos Island, Greece, Sep. 2024. [Paper]
  59. J. Wang, T. Toda. Unsupervised training of neural network-based virtual microphone estimator. Proc. EUSIPCO, pp. 256-260, Lyon, France, Aug. 2024. [Paper]
  60. T. Fujimura, K. Imoto, T. Toda. Discriminative neighborhood smoothing for generative anomalous sound detection. Proc. EUSIPCO, pp. 156-160 Lyon, France, Aug. 2024. [Paper]
  61. D. Ma, Y. Choi, F. Li, C. Xie, K. Kobayashi, T. Toda. Robust sequence-to-sequence voice conversion for electrolaryngeal speech enhancement in noisy and reverberant conditions. Proc. IEEE EMBC, 4 pages, Orlando, USA, July 2024. [Paper]
  62. F. Li, F. Shen, D. Ma, S. Zhang, J. Zhou, L. Wang, F. Fan, T. Liu, X. Chen, T. Toda, H. Niu. Mandarin speech reconstruction from tongue motion ultrasound images based on generative adversarial networks. Proc. IEEE EMBC, 4 pages, Orlando, USA, July 2024. [Paper]
  63. T. Komatsu, Y. Fujita, K. Takeda, T. Toda. Audio difference learning for audio captioning. IEEE ICASSP, pp. 1456-1460, Seoul, Korea, Apr. 2024. [Paper]
  64. Y. Ohtani, T. Okamoto, T. Toda, H. Kawai. FIRNET: fundamental frequency controllable fast neural vocoder with trainable finite impulse response filter. IEEE ICASSP, pp. 10871-10875, Seoul, Korea, Apr. 2024. [Paper]
  65. L.P. Violeta, W.-C. Huang, D. Ma, R. Yamamoto, K. Kobayashi, T. Toda. Electrolaryngeal speech intelligibility enhancement through robust linguistic encoders. IEEE ICASSP, pp. 10961-10965, Seoul, Korea, Apr. 2024. [Paper]
  66. J. He, X. Shi, X. Li, T. Toda. MF-AED-AEC: speech emotion recognition by leveraging multimodal fusion, ASR error detection, and ASR error correction. IEEE ICASSP, pp. 11066-11070, Seoul, Korea, Apr. 2024. [Paper]
  67. T. Okamoto, Y. Ohtani, T. Toda, H. Kawai. ConvNeXt-TTS and ConvNeXt-VC: ConvNeXt-based fast end-to-end sequence-to-sequence text-to-speech and voice conversion. IEEE ICASSP, pp. 12456-12460, Seoul, Korea, Apr. 2024. [Paper]
  68. W.-C. Huang, L.P. Violeta, S. Liu, J. Shi, T. Toda. The Singing Voice Conversion Challenge 2023. Proc. IEEE ASRU, 8 pages, Taipei, Taiwan, Dec. 2023. [Paper]【Selected as Top 3% Papers】
  69. J. He, Z. Yang, T. Toda. ED-CEC: improving rare word recognition using ASR post-processing based on error detection and context-aware error correction. Proc. IEEE ASRU, 6 pages, Taipei, Taiwan, Dec. 2023. [Paper]
  70. B. Halpern, W.-C. Huang, L.P. Violeta, R. van Son, T. Toda. Improving severity preservation of healthy-to-pathological voice conversion with global style tokens. Proc. IEEE ASRU, 7 pages, Taipei, Taiwan, Dec. 2023. [Paper]
  71. R. Yamamoto, R. Yoneyama, L.P. Violeta, W.-C. Huang, T. Toda. A comparative study of voice conversion models with large-scale speech and singing data: the T13 systems for the Singing Voice Conversion Challenge 2023. Proc. IEEE ASRU, 6 pages, Taipei, Taiwan, Dec. 2023. [Paper]
  72. E. Cooper, W.-C. Huang, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. The VoiceMOS Challenge 2023: zero-shot subjective speech quality prediction for multiple domains. Proc. IEEE ASRU, 7 pages, Taipei, Taiwan, Dec. 2023. [Paper]【Selected as Top 3% Papers】
  73. T. Okamoto, H. Yamashita, Y. Ohtani, T. Toda, H. Kawai. WaveNeXt: ConvNeXt-based fast neural vocoder without iSTFT layer. Proc. IEEE ASRU, 8 pages, Taipei, Taiwan, Dec. 2023. [Paper]
  74. S. Kim, K. Takeda, T. Toda. Sequence-to-sequence network training methods for automatic guitar transcription with tokenized outputs. Proc. ISMIR, pp. 524-531, Milan, Italy, Nov. 2023. [Paper]
  75. W.-C. Huang, T. Toda. Evaluating methods for ground-truth-free foreign accent conversion. Proc. APSIPA ASC, pp. 1136-1141, Taipei, Taiwan, Nov. 2023. [Paper]
  76. L.P. Violeta, T. Toda. An analysis of personalized speech recognition system development for the deaf and hard-of-hearing. Proc. APSIPA ASC, pp. 1851-1856, Taipei, Taiwan, Nov. 2023. [Paper]
  77. J. Tian, D. Hu, X. Shi, J. He, X. Li, Y. Gao, T. Toda, X. Xu, X. Hu. Semi-supervised multimodal emotion recognition with consensus decision-making and label correction. Proc. 1st International Workshop on Multimodal and Responsible Affective Computing (MRAC), pp. 67-73, Ottawa, Canada, Oct. 2023. [Paper]
  78. A. Miyashita, T. Toda. Differentiable representation of warping based on Lie group theory. Proc. IEEE WASPAA, 5 pages, New Paltz, USA, Oct. 2023. [Paper]【IEEE WASPAA 2023 Best Paper Award (受賞者：Atsushi Miyashita)】
  79. R. Wang, T. Toda. Directional target speaker extraction under noisy underdetermined conditions through conditional variational autoencoder with global style tokens. Proc. IEEE WASPAA, 5 pages, New Paltz, USA, Oct. 2023. [Paper]
  80. S. Luan, Y. Wakabayashi, T. Toda. Sound field interpolation with unsupervised calibration for freely spaced circular microphone array in rotation-robust beamforming Proc. EUSIPCO, pp. 21-25, Sep. 2023. [Paper]
  81. C.H. Hu, Y. Yasuda, T. Toda. Preference-based training framework for automatic speech quality assessment using deep neural network. Proc. INTERSPEECH, pp. 546-550, Dublin, Ireland, Aug. 2023. [Paper]
  82. X. Shi, X. Li, T. Toda. Emotion awareness in multi-utterance turn for improving emotion prediction in multi-speaker conversation. Proc. INTERSPEECH, pp. 765-769, Dublin, Ireland, Aug. 2023. [Paper]
  83. T. Okamoto, H. Yamashita, T. Toda, H. Kawai. E2E-S2S-VC: end-to-end sequence-to-sequence voice conversion. Proc. INTERSPEECH, pp. 2043-2047, Dublin, Ireland, Aug. 2023. [Paper]
  84. Y. Choi, C. Xie, T. Toda. Reverberation-controllable voice conversion using reverberation time estimator. Proc. INTERSPEECH, pp. 2103-2107, Dublin, Ireland, Aug. 2023. [Paper]
  85. Y. Yasuda, T. Toda. Analysis of mean opinion scores in subjective evaluation of synthetic speech based on tail probabilities. Proc. INTERSPEECH, pp. 5491-5495, Dublin, Ireland, Aug. 2023. [Paper]
  86. Y. Yasuda, T. Toda. Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
  87. K. Kobayashi, T. Hayashi, T. Toda. Low-latency electrolaryngeal speech enhancement based on FastSpeech2-based voice conversion and self-supervised speech representation. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
  88. R. Yamamoto, R. Yoneyama, T. Toda. NNSVS: a neural network based singing voice synthesis toolkit. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
  89. R. Yoneyama, Y.-C. Wu, T. Toda. Source-Filter HiFiGAN: fast and pitch controllable high-fidelity neural vocoder. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
  90. L.P. Violeta, D. Ma, W.-C. Huang, T. Toda. Intermediate fine-tuning using imperfect synthetic speech for improving electrolaryngeal speech recognition. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
  91. T. Fujimura, T. Toda. Analysis of Noisy-target Training for DNN-based speech enhancement. Proc. IEEE ICASSP, 5 pages,June 2023. [Paper]
  92. A. Miyashita, T. Toda. Representation of vocal tract length transformation based on group theory. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
  93. D. Ma, L.P. Violeta, K. Kobayashi, T. Toda. Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion. Proc. IEEE SLT, pp. 949-954, Doha, Qatar, Jan. 2023. [Paper]
  94. Y. Hashizume, L. Li, T. Toda. Music similarity calculation of individual instrumental sounds using metric learning. Proc. APSIPA ASC, pp. 33-38, Chiang Mai, Thailand, Nov. 2022. [Paper]
  95. J. Feng, T. Yoshikawa, T. Toda. Interpretable control for emotional text-to-speech system toward development of sympathetic educational-support robots. Proc. APSIPA ASC, pp. 342-346, Chiang Mai, Thailand, Nov. 2022. [Paper]
  96. R. Wang, L. Li, T. Toda. Direction-aware target speaker extraction with a dual-channel system based on conditional variational autoencoders under underdetermined conditions. Proc. APSIPA ASC, pp. 347-353, Chiang Mai, Thailand, Nov. 2022. [Paper]
  97. S. Chen, T. Toda. Sequence-wise optimization for quasi-harmonic speech waveform modeling. Proc. APSIPA ASC, pp. 1658-1663, Chiang Mai, Thailand, Nov. 2022. [Paper]
  98. I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Improvement of anomalous sound detection method considering the distribution of embedding. Proc. ICA, ABS-0189, 5 pages, Gyeongju, Korea, Oct. 2022 (Invited in structured session "A13-02: Anomalous sound detection and classification for condition monitoring").
  99. C. Xie, T. Toda. Noisy-to-noisy voice conversion with pre-training strategy. Proc. ICA, ABS-0801, 5 pages, Gyeongju, Korea, Oct. 2022 (Invited in structured session "A15-06: Voice conversion").
  100. L.P. Violeta, W.-C. Huang, T. Toda. Investigating self-supervised pretraining frameworks for pathological speech recognition. Proc. INTERSPEECH, pp. 41-45, Incheon, Korea, Sep. 2022. [Paper]
  101. R. Yoneyama, Y.-C. Wu, T. Toda. Unified source-filter GAN with harmonic-plus-noise source excitation generation. Proc. INTERSPEECH, pp. 848-852, Incheon, Korea, Sep. 2022. [Paper]
  102. W.-C. Huang, E. Cooper, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. The VoiceMOS Challenge 2022. Proc. INTERSPEECH, pp. 4536-4540, Incheon, Korea, Sep. 2022. [Paper]
  103. D. Yoshioka, Y. Yaduda, N. Matsunaga, Y. Ohtani, T. Toda. Spoken-text-style transfer with conditional variational autoencoder and content word storage. Proc. INTERSPEECH, pp. 4576-4580, Incheon, Korea, Sep. 2022. [Paper]
  104. Y. Choi, C. Xie, T. Toda. An evaluation of three-stage voice conversion framework for noisy and reverberant conditions. Proc. INTERSPEECH, pp. 4910-4914, Incheon, Korea, Sep. 2022. [Paper]
  105. S. Kim, T. Hayashi, T. Toda. Note-level automatic guitar transcription using attention mechanism. Proc. EUSIPCO, pp. 229-233, Belgrade, Serbia, Aug.-Sep. 2022. [Paper]
  106. I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Improvement of serial approach to anomalous sound detection by incorporating two binary cross-entropies for outlier exposure. Proc. EUSIPCO, pp. 294-298, Belgrade, Serbia, Aug.-Sep. 2022. [Paper]
  107. S. Luan, Y. Wakabayashi, T. Toda. Modified sound field interpolation method for rotation-robust beamforming with unequally spaced circular microphone array. Proc. EUSIPCO, pp. 344-348, Aug.-Sep. Belgrade, Serbia, 2022. [Paper]
  108. W.-C. Huang, E. Cooper, J. Yamagishi, T. Toda. LDNet: unified listener dependent modeling in MOS prediction for synthetic speech. Proc. IEEE ICASSP, pp. 896-900, May 2022. [Paper]
  109. W.-C. Huang, S.-W. Yang, T. Hayashi, H.-Y. Lee, S. Watanabe, T. Toda. S3PRL-VC: open-source voice conversion framework with self-supervised speech representations. Proc. IEEE ICASSP, pp. 6552-6556, May 2022. [Paper]
  110. W.-C. Huang, B.M Halpern, L.P. Violeta, O. Scharenborg, T. Toda. Towards identity preserving normal to dysarthric voice conversion. Proc. IEEE ICASSP, pp. 6672-6676, May 2022. [Paper]
  111. C. Xie, Y-.C. Wu, P.L. Tobing, W-.C. Huang, T. Toda. Direct noisy speech modeling for noisy-to-noisy voice conversion. Proc. IEEE ICASSP, pp. 6787-6791, May 2022. [Paper]
  112. T. Hayashi, K. Kobayashi, T. Toda. An investigation of streaming non-autoregressive sequence-to-sequence voice conversion. Proc. IEEE ICASSP, pp. 6802-6806, May 2022. [Paper]
  113. E. Cooper, W.-C. Huang, T. Toda, J. Yamagishi. Generalization ability of MOS prediction networks. Proc. IEEE ICASSP, pp. 8442-8446, May 2022. [Paper]
  114. W.-C. Huang, S.-W. Yang, T. Hayashi, H.-Y. Lee, S. Watanabe, T. Toda. S3PRL-VC: open-source voice conversion framework with self-supervised speech representations. Proc. AAAI-22 Workshop, W35: Self-Supervised Learning for Audio and Speech Processing, 5 pages, Feb. 2022. [Paper]
  115. Z. Qian, H. Niu, L. Wang, K. Kobayashi, S. Zhang, T. Toda. Mandarin electro-laryngeal speech enhancement based on statistical voice conversion and manual tone control. Proc. APSIPA ASC, pp. 546-552, Dec. 2021. [Paper]
  116. C. Xie, Y.-C. Wu, P.L. Tobing, W.-C. Huang, T. Toda. Noisy-to-noisy voice conversion framework with denoising model. Proc. APSIPA ASC, pp. 814-820, Dec. 2021. [Paper]
  117. D. Ma, W.-C. Huang, T. Toda. Investigation of text-to-speech-based synthetic parallel data for sequence-to-sequence non-parallel voice conversion. Proc. APSIPA ASC, pp. 870-877, Dec. 2021. <APSIPA ASC 2021 The Best Paper Award> [Paper]
  118. Y.-S. Liou, W.-C. Huang, M.-C. Yen, S.-W. Tsai, Y.-H. Peng, T. Toda, Y. Tsao, H.-M. Wang. Time alignment using lip images for frame-based electrolaryngeal voice conversion. Proc. APSIPA ASC, pp. 1234-1238, Dec. 2021. [Paper]
  119. T. Okamoto, T. Toda, H. Kawai. Multi-stream HiFi-GAN with data-driven waveform decomposition. Proc. IEEE ASRU, pp. 610-617, Dec. 2021. [Paper]
  120. W.-C. Huang, T. Hayashi, X. Li, S. Watanabe, T. Toda. On prosody modeling for ASR+TTS based voice conversion," . Proc. IEEE ASRU, pp. 642-649, Dec. 2021. [Paper]
  121. M.-C. Yen, W.-C. Huang, K. Kobayashi, Y.-H. Peng, S.-W. Tasi, Y. Tsao, T. Toda, J.-S. R. Jang, H.-M. Wang. Mandarin electrolaryngeal speech voice conversion with sequence-to-sequence modeling. Proc. IEEE ASRU, pp. 650-657, Dec. 2021. [Paper]
  122. H.-T. Chiang, Y.-C. Wu, C. Yu, T. Toda, H.-M. Wang, Y.-C. Hu, Y. Tsao. HASA-Net: a non-intrusive hearing-aid speech assessment network. Proc. IEEE ASRU, pp. 907-913, Dec. 2021. [Paper]
  123. I. Kuroyanagi, T. Hayashi, Y. Adachi, T. Yoshimura, K. Takeda, T. Toda. An ensemble approach to anomalous sound detection based on conformer-based autoencoder and binary classifier incorporated with metric learning. Proc. DCASE 2021 Workshop, pp. 110-114, Nov. 2021. [Paper]
  124. S. Seki, H. Taga, T. Toda. Singing fundamental frequency contour generation using generalized command response model and score-conditional variational autoencoder. Proc. IEEE MLSP, 6 pages, Oct. 2021. [Paper]
  125. W.-C. Huang, K. Kobayashi, Y.-H. Peng, C.-F. Liu, Y. Tsao, H.-M. Wang, T. Toda. A preliminary study of a two-stage paradigm for preserving speaker identity in dysarthric voice conversion. Proc. INTERSPEECH, pp. 1329-1333, Aug.-Sep. 2021. [Paper]
  126. R. Yoneyama, Y.-C. Wu, T. Toda. Unified source-filter GAN: unified source-filter network based on factorization of quasi-periodic parallel WaveGAN. Proc. INTERSPEECH, pp. 2187-2191, Aug.-Sep. 2021. [Paper]
  127. P.L. Tobing, T. Toda. High-fidelity and low-latency universal neural vocoder based on multiband WaveRNN with data-driven linear prediction for discrete waveform modeling. Proc. INTERSPEECH, pp. 2217-2221, Aug.-Sep. 2021. [Paper]
  128. Y.-C. Wu, C.-H. Hu, H.-S. Lee, Y.-H. Peng, W.-C. Huang, Y. Tsao, H.-M. Wang, T. Toda. Relational data selection for data augmentation of speaker-dependent multi-band MelGAN vocoder. Proc. INTERSPEECH, pp. 3630-3634, Aug.-Sep. 2021. [Paper]
  129. P.L. Tobing, T. Toda. Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction. Proc. 11th ISCA Speech Synthesis Workshop (SSW11) , pp. 142-147, Aug. 2021. [Paper]
  130. I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Anomalous sound detection using a binary classification model and class centroids. Proc. EUSIPCO, pp. 1995-1999, Aug. 2021. [Paper]
  131. K. Kobayashi, W.-C. Huang, Y.-C. Wu, S. P.L. Tobing, T. Hayashi, T. Toda. Crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder. Proc. IEEE ICASSP, pp. 5934-5938, June 2021. [Paper]
  132. W.-C. Huang, Y.-C. Wu, T. Hayashi, T. Toda. Any-to-one sequence-to-sequence voice conversion using self-supervised discrete speech representations. Proc. IEEE ICASSP, pp. 5944-5948, June 2021. [Paper]
  133. T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Noise level limited sub-modeling for diffusion probabilistic vocoders. Proc. IEEE ICASSP, pp. 6029-6033, June 2021. [Paper]
  134. A. Ando, R. Masumura, H. Sato, T. Moriya, T. Ashihara, Y. Ijima, T. Toda. Speech emotion recognition based on listener adaptive models. Proc. IEEE ICASSP, pp. 6274-6278, June 2021. [Paper]
  135. K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, Y. Shiga, H. Kawai. High-intelligibility speech synthesis for dysarthric speakers with LPCNet-based TTS and CycleVAE-based VC. Proc. IEEE ICASSP, pp. 7058-7062, June 2021. [Paper]
  136. T. Hayashi, W.-C. Huang, K. Kobayashi, T. Toda. Non-autoregressive sequence-to-sequence voice conversion. Proc. IEEE ICASSP, pp. 7068-6072, June 2021. [Paper]
  137. W.-C. Huang, C.-H. Wu, S.-B. Luo, K.-Y. Chen, H.-M. Wang, T. Toda. Speech recognition by simply fine-tuning BERT. Proc. IEEE ICASSP, pp. 7343-7347, June 2021. [Paper]
  138. H. Nakatani, P.L. Tobing, K. Takeda, T. Toda. Cross-lingual voice conversion using cyclic variational auto-encoder and a WaveNet vocoder. Proc. APSIPA ASC, pp. 520-526, Dec. 2020. [Paper]
  139. M. Eshghi, K. Kobayashi, K. Tanaka, H. Kameoka, T. Toda. Phoneme embeddings on predicting fundamental frequency pattern for electrolaryngeal speech. Proc. APSIPA ASC, pp. 572-577, Dec. 2020. [Paper]
  140. K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Conformer-based sound event detection with semi-supervised learning and data augmentation. Proc. DCASE 2020 Workshop, pp. 100-104, Nov. 2020. [Paper]
  141. Z. Yi, W.-C. Huang, X. Tian, J. Yamagishi, R.K. Das, T. Kinnunen, Z. Ling, T. Toda. Voice Conversion Challenge 2020 –- intra-lingual semi-parallel and cross-lingual voice conversion –-. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 80-98, Oct. 2020. [Paper]
  142. R.K. Das, T. Kinnunen, W.-C. Huang, Z. Ling, J. Yamagishi, Z. Yi, X. Tian, T. Toda. Predictions of subjective ratings and spoofing assessments of Voice Conversion Challenge 2020 submissions. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 99-120, Oct. 2020. [Paper]
  143. P.L. Tobing, Y.-C. Wu, T. Toda. Baseline system of Voice Conversion Challenge 2020 with cyclic variational autoencoder and parallel WaveGAN. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 155-159, Oct. 2020. [Paper]
  144. W.-C. Huang, T. Hayashi, S. Watanabe, T. Toda. The sequence-to-sequence baseline for the Voice Conversion Challenge 2020: cascading ASR and TTS. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 160-164, Oct. 2020. [Paper]
  145. W.-C. Huang, P.L. Tobing, Y.-C. Wu, K. Kobayashi, T. Toda. The NU voice conversion system for the Voice Conversion Challenge 2020: on the effectiveness of sequence-to-sequence models and autoregressive neural vocoders. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 165-169, Oct. 2020. [Paper]
  146. Y.-C. Wu, T. Hayashi, T. Okamoto, H. Kawai, T. Toda. Quasi-periodic parallel WaveGAN vocoder: a non-autoregressive pitch-dependent dilated convolution model for parametric speech generation. Proc. INTERSPEECH, pp. 3535-3539, Oct. 2020. [Paper]
  147. Y.-C. Wu, P.L. Tobing, K. Yasuhara, N. Matsunaga, Y. Ohtani, T. Toda. A cyclical post-filtering approach to mismatch refinement of neural vocoder for text-to-speech systems. Proc. INTERSPEECH, pp. 3540-3544, Oct. 2020. [Paper]
  148. S. Seki, M. Takada, T. Toda. Semi-supervised self-produced speech enhancement and suppression based on joint source modeling of air- and body-conducted signals using variational autoencoder. Proc. INTERSPEECH, pp. 4039-4043, Oct. 2020. [Paper]
  149. S. Hikosaka, S. Seki, T. Hayashi, K. Kobayashi, K. Takeda, H. Banno, T. Toda. Intelligibility enhancement based on speech waveform modification using hearing impairment simulator. Proc. INTERSPEECH, pp. 4059-4063, Oct. 2020. [Paper]
  150. W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, T. Toda. Voice transformer network: sequence-to-sequence voice conversion using transformer with text-to-speech pretraining. Proc. INTERSPEECH, pp. 4676-4680, Oct. 2020. [Paper]
  151. P.L. Tobing, T. Hayashi, Y.-C. Wu, K. Kobayashi, T. Toda. Cyclic spectral modeling for unsupervised unit discovery into voice conversion with excitation and waveform modeling. Proc. INTERSPEECH, pp. 4861-4865, Oct. 2020. [Paper]
  152. K. Kobayashi, T. Toda. Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN. Proc. EUSIPCO, pp. 396-400, Aug. 2020. [Paper]
  153. M. Takada, S. Seki, P.L. Tobing, T. Toda. Semi-supervised enhancement and suppression of self-produced speech using correspondence between air- and body-conducted signals. Proc. EUSIPCO, pp. 456-460, Aug. 2020. [Paper]
  154. K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Weakly-supervised sound event detection with self-attention. Proc. IEEE ICASSP, pp. 66-70, Full virtual, May 2020. [Paper]
  155. T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Transformer-based text-to-speech with weighted forced attention. Proc. IEEE ICASSP, pp. 6729-6733, Full virtual, May 2020. [Paper]
  156. P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda. Efficient shallow WaveNet vocoder using multiple samples output based on Laplacian distribution and linear prediction. Proc. IEEE ICASSP, pp. 7204-7208, Full virtual, May 2020. [Paper]
  157. T. Hayashi, R. Yamamoto, K. Inoue, T. Yoshimura, S. Watanabe, T. Toda, K. Takeda, Y. Zhang, X. Tan. ESPNET-TTS: Uunified, reproducible, and integratable open source end-to-end text-to-speech toolkit. Proc. IEEE ICASSP, pp. 7654-7658, Full virtual, May 2020. [Paper]
  158. P.L. Tobing, T. Hayashi, T. Toda. Investigation of shallow WaveNet vocoder with Laplacian distribution output. Proc. IEEE ASRU, pp. 176-183, Sentosa, Singapore, Dec. 2019. [Paper]
  159. T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Tacotron-based acoustic model using phoneme alignment for practical neural text-to-speech synthesis. Proc. IEEE ASRU, pp. 214-221, Sentosa, Singapore, Dec. 2019. [Paper]
  160. L. Li, T. Toda, K. Morikawa, K. Kobayashi, S. Makino. Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE. Proc. ISMIR, pp. 784-790, Delft, the Netherlands, Nov. 2019. [Paper]
  161. F. Ahmadi, K. Kobayashi, T. Toda. Development of a real-time bionic voice generation system based on statistical excitation prediction. Proc. ACM ASSETS, pp. 655-657, Posters and Demos, Pittsburgh, USA, Oct. 2019. [Paper]
  162. W.-C. Huang, Y.-C. Wu, K. Kobayashi, Y.-H. Peng, H.-T. Hwang, P.L. Tobing, Y. Tsao, H.-M. Wang, T. Toda. Generalization of spectrum differential based direct waveform modification for voice conversion. Proc. 10th ISCA Speech Synthesis Workshop (SSW10), pp. 57-62, Vienna, Austria, Sep. 2019. [Paper]
  163. Y.-C. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda. Statistical voice conversion with quasi-periodic WaveNet vocoder. Proc. 10th ISCA Speech Synthesis Workshop (SSW10), pp. 63-68, Vienna, Austria, Sep. 2019. [Paper]
  164. M. Eshghi, K. Tanaka, K. Kobayashi, H. Kameoka, T. Toda. An investigation of features for fundamental frequency pattern prediction in electrolaryngeal speech enhancement. Proc. 10th ISCA Speech Synthesis Workshop (SSW10), pp. 251-256, Vienna, Austria, Sep. 2019. [Paper]
  165. Y.-C. Wu, T. Hayashi, P.L. Tobing, K. Kobayashi, T. Toda. Quasi-periodic WaveNet vocoder: a pitch dependent dilated convolution model for parametric speech generation. Proc. INTERSPEECH, pp. 196-200, Graz, Austria, Sep. 2019. [Paper]
  166. P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda. Non-parallel voice conversion with cyclic variational autoencoder. Proc. INTERSPEECH, pp. 674-678, Graz, Austria, Sep. 2019. [Paper]
  167. Y. Kurita, K. Kobayashi, K. Takeda, T. Toda. Robustness of statistical voice conversion based on direct waveform modification against background sounds. Proc. INTERSPEECH, pp. 684-688, Graz, Austria, Sep. 2019. [Paper]
  168. W.-C. Huang, Y.-C. Wu, C.-C. Lo, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda, Y. Tsao, H.-M. Wang. Investigation of F0 conditioning and fully convolutional networks in variational autoencoder based voice conversion. Proc. INTERSPEECH, pp. 709-713, Graz, Austria, Sep. 2019. [Paper]
  169. T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Real-time neural text-to-speech with sequence-to-sequence acoustic model and WaveGlow or single Gaussian WaveRNN vocoders. Proc. INTERSPEECH, pp. 1308-1312, Graz, Austria, Sep. 2019. [Paper]
  170. T. Hayashi, S. Watanabe, T. Toda, K. Takeda, S. Toshniwal, K. Livescu. Pre-trained text embeddings for enhanced text-to-speech synthesis. Proc. INTERSPEECH, pp. 4430-4434, Graz, Austria, Sep. 2019. [Paper]
  171. S. Seki, H. Kameoka, L. Li, T. Toda, K. Takeda. Generalized multichannel variational autoencoder for underdetermined source separation. Proc. EUSIPCO, 5 pages, A Coruna, Spain, Sep. 2019. [Paper]
  172. W.-C. Huang, Y.-C. Wu, H.-T. Hwang, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda, Y. Tsao, H.-M. Wang. Refined WaveNet vocoder for variational autoencoder based voice conversion. Proc. EUSIPCO, 5 pages, A Coruna, Spain, Sep. 2019. [Paper]
  173. T. Komatsu, T. Hayashi, R. Kondo, T. Toda, K. Takeda. Scene-dependent anomalous acoustic-event detection based on conditional WaveNet and i-Vector. Proc. IEEE ICASSP, pp. 870-874, Brighton, UK, May 2019. [Paper]
  174. P.L. Tobing, Y. Wu, T. Hayashi, K. Kobayashi, T. Toda. Voice conversion with cyclic recurrent neural network and fine-tuned WaveNet vocoder. Proc. IEEE ICASSP, pp. 6815-6819, Brighton, UK, May 2019. [Paper]
  175. T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features. Proc. IEEE ICASSP, pp. 7020-7024, Brighton, UK, May 2019. [Paper]
  176. P.L. Tobing, T. Hayashi, Y. Wu, K. Kobayashi, T. Toda. An evaluation of deep spectral mappings and WaveNet vocoder for voice conversion. Proc. IEEE SLT, pp. 297-303, Athens, Greece, Dec. 2018. [Paper]
  177. T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Improving FFTNet vocoder with noise shaping and subband approaches. Proc. IEEE SLT, pp. 304-311, Athens, Greece, Dec. 2018. [Paper]
  178. T. Hayashi, S. Watanabe, Y. Zhang, T. Toda, T. Hori, R. Astudillo, K. Takeda. Back-translation-style data augmentation for end-to-end ASR. Proc. IEEE SLT, pp. 426-433, Athens, Greece, Dec. 2018. [Paper]
  179. M. Takada, S. Seki, T. Toda. Self-produced speech enhancement and suppression method using air- and body-conductive microphones. Proc. APSIPA ASC, pp. 1240-1245, Hawaii, USA, Nov. 2018. [Paper]
  180. K. Miyazaki, T. Hayashi, T. Toda, K. Takeda. Connectionist temporal classification-based sound event encoder for converting sound events into onomatopoeia representations. Proc. EUSIPCO, pp. 857-861, Rome, Italy, Sep. 2018. [Paper]
  181. K. Kobayashi, T. Toda. Electrolarygeal speech enhancement with statistical voice conversion based on CLDNN. Proc. EUSIPCO, pp. 2129-2133, Rome, Italy, Sep. 2018. [Paper]
  182. T. Hayashi, T. Komatsu, R. Kondo, T. Toda, K. Takeda. Anomalous sound event detection based on WaveNet. Proc. EUSIPCO, pp. 2508-2512, Rome, Italy, Sep. 2018. [Paper]
  183. T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Multi-Head Decoder for end-to-end speech recognition. Proc. INTERSPEECH, pp. 801-805, Hyderabad, India, Sep. 2018. [Paper]
  184. Y. Wu, K. Kobayashi, T. Hayashi, P.L. Tobing, T. Toda. Collapsed segment detection and reduction for WaveNet vocoder. Proc. INTERSPEECH, pp. 1998-1992, Hyderabad, India, Sep. 2018. [Paper]
  185. H. Kawahara, K. Sakakibara, M. Morise, H. Banno, T. Toda, T. Irino. Frequency domain variants of velvet noise and their application to speech processing and synthesis. Proc. INTERSPEECH, pp. 2027-2031, Hyderabad, India, Sep. 2018. [Paper]
  186. S. Tamura, K. Horio, H. Endo, S. Hayamizu, T. Toda. Audio-visual voice conversion using deep canonical correlation analysis for deep bottleneck features. Proc. INTERSPEECH, pp. 2469-2473, Hyderabad, India, Sep. 2018. [Paper]
  187. F. Ahmadi, T. Toda. Designing a pneumatic bionic voice prosthesis - statistical approach for source excitation generation. Proc. INTERSPEECH, pp. 3142-3146, Hyderabad, India, Sep. 2018. [Paper]
  188. T. Kinnunen, J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, Z. Ling. A spoofing benchmark for the 2018 voice conversion challenge: leveraging from spoofing countermeasures for speech artifact assessment. Proc. Odyssey 2018, pp. 187-194, Les Sables d'Olonne, France, June 2018. [Paper]
  189. J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, T. Kinnunen, Z. Ling. The voice conversion challenge 2018: promoting development of parallel and nonparallel methods. Proc. Odyssey 2018, pp. 195-202, Les Sables d'Olonne, France, June 2018. [Paper]
  190. K. Kobayashi, T. Toda. sprocket: open-source voice conversion software. Proc. Odyssey 2018, pp. 203-210, Les Sables d'Olonne, France, June 2018. [Paper]
  191. Y. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda. The NU non-parallel voice conversion system for the voice conversion challenge 2018. Proc. Odyssey 2018, pp. 211-218, Les Sables d'Olonne, France, June 2018. [Paper]
  192. P.L. Tobing, Y. Wu, T. Hayashi, K. Kobayashi, T. Toda. NU voice conversion system for the voice conversion challenge 2018. Proc. Odyssey 2018, pp. 219-226, Les Sables d'Olonne, France, June 2018. [Paper]
  193. S. Seiya, R. Ito, K. Okamoto, U. Tanikawa, S. Ohira, D. Deguchi, T. Toda. Development of "KamiRepo" system with automatic student identification to handle handwritten assignments on LMS. Proc. EDUCON, pp. 841-848, Canary Islands, Spain, Apr. 2018. [Paper]
  194. T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, H. Kawai. An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features. Proc. IEEE ICASSP, pp. 5654-5658, Calgary, Canada, Apr. 2018. [Paper]
  195. K. Tachibana, T. Toda, Y. Shiga, H. Kawai. An investigation of noise shaping with perceptual weighting for WaveNet-based speech generation. Proc. IEEE ICASSP, pp. 5664-5668, Calgary, Canada, Apr. 2018. [Paper]
  196. T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, H. Kawai. Subband WaveNet with overlapped single-sideband filterbanks. Proc. IEEE ASRU, pp. 698-704, Okinawa, Japan, Dec. 2017. [Paper]
  197. T. Hayashi, A. Tamamori, K. Kobayashi, K. Takeda, T. Toda. An investigation of multi-speaker training for WaveNet vocoder. Proc. IEEE ASRU, pp. 712-718, Okinawa, Japan, Dec. 2017. [Paper]
  198. K. Morikawa, T. Toda. Electrolaryngeal speech modification towards singing aid system for laryngectomees. Proc. APSIPA, 4 pages, Kuala Lumpur, Malaysia, Dec. 2017. [Paper]
  199. P.L. Tobing, H. Kameoka, T. Toda. Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling. Proc. APSIPA, 4 pages, Kuala Lumpur, Malaysia, Dec. 2017. [Paper]
  200. A. Tamamori, T. Hayashi, T. Toda, K. Takeda. Investigation of effectiveness on recurrent neural network for daily activity recognition using multi-modal signals. Proc. APSIPA, 7 pages, Kuala Lumpur, Malaysia, Dec. 2017 (Invited Talk in Special Session). [Paper]
  201. K. Kubo, K. Kobayashi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An Investigation of how to design control parameters for statistical voice timbre control. Proc. APSIPA, 4 pages, Kuala Lumpur, Malaysia, Dec. 2017. [Paper]
  202. H. Kawahara, K. Sakakibara, M. Morise, H. Banno, T. Toda. Accurate estimation of fo and aperiodicity based on periodicity detector residuals and deviations of phase derivatives. Proc. APSIPA, 9 pages, Kuala Lumpur, Malaysia, Dec. 2017. [Paper]
  203. S. Seki, H. Kameoka, T. Toda, K. Takeda. Missing component restoration for masked speech signals based on time-domain spectrogram factorization. Proc. MLSP, 6 pages, Tokyo, Japan, Sep. 2017. [Paper]
  204. S. Seki, T. Toda, K. Takeda. Stereophonic music separation based on non-negative tensor factorization with cepstrum regularization. Proc. EUSIPCO, pp. 1011-1015, Kos island, Greece, Aug. 2017. [Paper]
  205. H. Kawahara, K. Sakakibara, M. Morise, H. Banno, T. Toda. A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and FO estimation. Proc. INTERSPEECH, pp. 424-428, Stockholm, Sweden, Aug. 2017. [Paper]
  206. K. Tanaka, H. Kameoka, T. Toda, S. Nakamura. Physically constrained statistical F0 prediction for electrolaryngeal speech enhancement. Proc. INTERSPEECH, pp. 1069-1073, Stockholm, Sweden, Aug. 2017. [Paper]
  207. A. Tamamori, T. Hayashi, K. Kobayashi, K. Takeda, T. Toda. Speaker-dependent WaveNet vocoder. Proc. INTERSPEECH, pp. 1118-1122, Stockholm, Sweden, Aug. 2017. [Paper]
  208. K. Kobayashi, T. Hayashi, A. Tamamori, T. Toda. Statistical voice conversion with WaveNet-based waveform generation. Proc. INTERSPEECH, pp. 1138-1142, Stockholm, Sweden, Aug. 2017. [Paper]
  209. H. Kawahara, K. Sakakibara, H. Banno, M. Morise, T. Toda, T. Irino. A new cosine series antialiasing function and its application to aliasing-free glottal source models for speech and singing synthesis. Proc. INTERSPEECH, pp. 1358-1362, Stockholm, Sweden, Aug. 2017. [Paper]
  210. L. Li, H. Kameoka, T. Toda, S. Makino. Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization. Proc. INTERSPEECH, pp. 1998-2002, Stockholm, Sweden, Aug. 2017. [Paper]
  211. T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda. BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic sound event detection. Proc. IEEE ICASSP, pp. 766-770, New Orleans, USA, Mar. 2017. [Paper]
  212. Y. Tajiri, H. Kameoka, T. Toda. A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body-conducted signals. Proc. IEEE ICASSP, pp. 4960-4964, New Orleans, USA, Mar. 2017. [Paper]
  213. K. Kobayashi, T. Toda, S. Nakamura. F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential. Proc. IEEE SLT, pp. 693-700, San Diego, USA, Dec. 2016. [Paper]
  214. A. Tamamori, T. Hayashi, T. Toda, K. Takeda. Investigation on recurrent neural network architectures for daily activity recognition. Proc. UV2016, 4 pages, Aichi, Japan, Oct. 2016.
  215. Y. Tajiri, T. Toda. Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring. Proc. 9th ISCA Speech Synthesis Workshop (SSW9), pp. 54-60, Sunnyvale, USA, Sep. 2016. [Paper]
  216. P.L. Tobing, T. Toda, H. Kameoka, S. Nakamura. Acoustic-to-articulatory inversion mapping based on latent trajectory Gaussian mixture model. Proc. INTERSPEECH, pp. 953-957, San Francisco, USA, Sep. 2016. [Paper]
  217. T. Toda, L.-H. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, J. Yamagishi. The Voice Conversion Challenge 2016. Proc. INTERSPEECH, pp. 1632-1636, San Francisco, USA, Sep. 2016. [Paper]
  218. K. Kobayashi, S. Takamichi, S. Nakamura, T. Toda. The NU-NAIST voice conversion system for the Voice Conversion Challenge 2016. Proc. INTERSPEECH, pp. 1667-1671, San Francisco, USA, Sep. 2016. <2017 Outstanding Paper Award for Young C&C Researchers (recipient: Kazuhiro Kobayashi)> [Paper]
  219. K. Tachibana, T. Toda, Y. Shiga, H. Kawai. Model integration for HMM- and DNN-based speech synthesis using Product-of-Experts framework. Proc. INTERSPEECH, pp. 2288-2292, San Francisco, USA, Sep. 2016. [Paper]
  220. Q. Truong Do, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A hybrid system for continuous word-level emphasis modeling based on HMM state clustering and adaptive training. Proc. INTERSPEECH, pp. 3196-3200, San Francisco, USA, Sep. 2016. [Paper]
  221. T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda. Bidirectional LSTM-HMM hybrid system for polyphonic sound event detection. Proc. DCASE2016 workshop, 5 pages, Budapest, Hungary, Sep. 2016. [Paper]
  222. K. Tanaka, T. Toda, G. Neubig, S. Nakamura. Real-time vibration control of an electrolarynx based on statistical F0 contour prediction. Proc. EUSIPCO, pp. 1333-1337, Budapest, Hungary, Aug. 2016. [Paper]
  223. H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Removing noise from event-related potentials using a probabilistic generative model with grouped covariance matrices. Proc. EMBC, 4 pages, Orlando, USA, Aug. 2016. [Paper]
  224. S. Yamane, K. Kobayashi, T. Toda, T. Nakano, M. Goto, S. Nakamura. An estimation method of voice timbre evaluation values using feature extraction with Gaussian mixture model based on reference singer. Proc. IEEE ICASSP, pp. 5265-5269, Shanghai, China, Mar. 2016. [Paper]
  225. K. Tanaka, H. Kameoka, T. Toda, S. Nakamura. Statistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework. Proc. IEEE ICASSP, pp. 5665-5669, Shanghai, China, Mar. 2016. [Paper]
  226. K. Kobayashi, T. Toda, S. Nakamura. Implementation of F0 transformation for statistical singing voice conversion based on direct waveform modification. Proc. IEEE ICASSP, pp. 5670-5674, Shanghai, China, Mar. 2016. [Paper]
  227. Y. Tajiri, T. Toda, S. Nakamura. Noise suppression method for body-conducted soft speech enhancement based on external noise monitoring. Proc. IEEE ICASSP, pp. 5935-5939, Shanghai, China, Mar. 2016. [Paper]
  228. T. Hiraoka, G. Neubig, K. Yoshino, T. Toda, S. Nakamura. Active learning for example-based dialog systems. Proc. IWSDS, 11 pages, Saariselka, Finland, Jan. 2016. [Paper]
  229. Y. Tsunomori, G. Neubig, T. Hiraoka, M. Mizukami, S. Sakti, T. Toda, S. Nakamura. A dialog system to detect deception. Proc. IWSDS, 6 pages, Saariselka, Finland, Jan. 2016. [Paper]
  230. S. Sakti, F. Ilham, G. Neubig, T. Toda, Purwarianti, S. Nakamura. Incremental sentence compression using LSTM recurrent networks. Proc. IEEE ASRU, pp. 252-258, Scottsdale, USA, Dec. 2015. [Paper]
  231. Q. Truong Do, M. Heck, S. Sakti, G. Neubig, T. Toda, S. Nakamura. The NAIST ASR system for the 2015 Multi-Genre Broadcast Challenge: on combination of deep learning systems using a rank-score function. Proc. IEEE ASRU, pp. 654-659, Scottsdale, USA, Dec. 2015. [Paper]
  232. N. Lubis, S. Sakti, G. Neubig, K. Yoshino, T. Toda, S. Nakamura. A study of social-affective communication: automatic prediction of emotion triggers and responses in television talk shows. Proc. IEEE ASRU, pp. 777-783, Scottsdale, USA, Dec. 2015. [Paper]
  233. M. Mizukami, H. Kizuki, T. Nomura, G. Neubig, K. Yoshino, S. Sakti, T. Toda, S. Nakamura. Adaptive selection from multiple response candidates in example-based dialogue. Proc. IEEE ASRU, pp. 784-790, Scottsdale, USA, Dec. 2015. [Paper]
  234. H. Kawahara, K. Sakakibara, H. Banno, M. Morise, T. Toda, T. Irino. Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation. Proc. APSIPA ASC, pp. 520-529, Hong Kong, China, Dec. 2015. [Paper]
  235. Q. Truong Do, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Improving translation of emphasis with pause prediction in speech-to-speech translation systems. Proc. IWSLT, pp. 204-208, Da Nang, Vietnam, Dec. 2015. [Paper]
  236. Y. Oda, H. Fudaba, G. Neubig, H. Hata, S. Sakti, T. Toda, S. Nakamura. Learning to generate pseudo-code from source code using statistical machine translation. Proc. ASE, pp. 574-584, Lincoln, USA, Nov. 2015. [Paper]
  237. H. Fudaba, Y. Oda, K. Akabe, G. Neubig, H. Hata, S. Sakti, T. Toda, S. Nakamura. Pseudogen: a tool to automatically generate pseudo-code from source code. Proc. ASE, Tool Demos, pp. 824-829, Lincoln, USA, Nov. 2015. [Paper]
  238. N. Lubis, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Construction and analysis of social-affective interaction corpus in English and Indonesian. Proc. O-COCOSDA, pp. 202-206, Shanghai, China, Oct. 2015. [Paper]
  239. K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An enhanced electrolarynx with automatic fundamental frequency control based on statistical prediction. Proc. ACM ASSETS, Demonstration paper, pp. 435-436, Lisbon, Portugal, Oct. 2015. [Paper]
  240. K. Sugiyama, M. Mizukami, G. Neubig, K. Yoshino, S. Sakti, T. Toda, S. Nakamura. An investigation of machine translation evaluation metrics in cross-lingual question answering. Proc. 10th Workshop on Statistical Machine Translation (WMT), pp. 442-449, Lisbon, Portugal, Sep. 2015. [Paper]
  241. Y. Nishigaki, S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Prosody-controllable HMM-based speech synthesis using speech input. Proc. MLSLP, 5 pages, Fukushima, Japan, Sep. 2015. [Paper]
  242. S. Takamichi, K. Kobayashi, K. Tanaka, T. Toda, S. Nakamura. The NAIST text-to-speech system for the Blizzard Challenge 2015. Proc. Blizzard Challenge 2015 Workshop, 4 pages, Berlin, Germany, Sep. 2015. [Paper]
  243. Y. Oshima, S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. Proc. INTERSPEECH, pp. 299-303, Dresden, Germany, Sep. 2015. [Paper]
  244. S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Modulation spectrum-constrained trajectory training algorithm for HMM-based speech synthesis. Proc. INTERSPEECH, pp. 1206-1210, Dresden, Germany, Sep. 2015. [Paper]
  245. T. Mieno, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Speed or accuracy? a study in evaluation of simultaneous speech translation. Proc. INTERSPEECH, pp. 2267-2271, Dresden, Germany, Sep. 2015. [Paper]
  246. T.T. Nguyen, G. Neubig, H. Shindo, S. Sakti, T. Toda, S. Nakamura. A latent variable model for joint pause prediction and dependency parsing. Proc. INTERSPEECH, pp. 2719-2723, Dresden, Germany, Sep. 2015. [Paper]
  247. K. Kobayashi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Statistical singing voice conversion based on direct waveform modification with global variance. Proc. INTERSPEECH, pp. 2754-2758, Dresden, Germany, Sep. 2015. [Paper]
  248. Y. Tajiri, K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments. Proc. INTERSPEECH, pp. 2769-2773, Dresden, Germany, Sep. 2015. [Paper]
  249. P.L. Tobing, K. Kobayashi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential. Proc. INTERSPEECH, pp. 3350-3354, Dresden, Germany, Sep. 2015. [Paper]
  250. D.Q. Truong, S. Takamichi, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs. Proc. INTERSPEECH, pp. 3665-3669, Dresden, Germany, Sep. 2015. [Paper]
  251. H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Evaluation of EEG ocular artifact removal with a multi-channel wiener filter based on probabilistic generative model. Proc. EMBC, 4 pages, Milan, Italy, Aug. 2015. [Paper]
  252. Y. Oda, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Syntax-based simultaneous translation through prediction of unseen syntactic constituents. Proc. ACL, pp. 198-207, Beijing, China, July 2015. [Paper]
  253. A. Miura, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Improving pivot translation by remembering the pivot. Proc. ACL, pp. 573-577, Beijing, China, July 2015. [Paper]
  254. Y. Oda, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Ckylark: a more robust PCFG-LA parser. Proc. NAACL HLT, Demo Track, pp. 41-45, Denver, USA, June 2015. [Paper]
  255. H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. EEG signal enhancement using multichannel Wiener filter with a spatial correlation prior. Proc. IEEE ICASSP, pp. 2639-2643, Brisbane, Australia, Apr. 2015. [Paper]
  256. S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Parameter generation algorithm considering modulation spectrum for HMM-based speech synthesis. Proc. IEEE ICASSP, pp. 4210-4214, Brisbane, Australia, Apr. 2015. [Paper]
  257. Z. Wu, A. Khodabakhsh, C. Demiroglu, J. Yamagishi, D. Saito, T. Toda, S. King. SAS: a speaker verification spoofing database containing diverse attacks. Proc. IEEE ICASSP, pp. 4440-4444, Brisbane, Australia, Apr. 2015. [Paper]
  258. A. Tjandra, S. Sakti, G. Neubig, T. Toda, M. Adriani, S. Nakamura. Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR. Proc. IEEE ICASSP, pp. 4525-4529, Brisbane, Australia, Apr. 2015. [Paper]
  259. S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Modulation spectrum-constrained trajectory training algorithm for GMM-based voice conversion. Proc. IEEE ICASSP, pp. 4859-4863, Brisbane, Australia, Apr. 2015. [Paper]
  260. H. Tanaka, S. Sakti, G. Neubig, T. Toda, H. Negoro, H. Iwasaka, S. Nakamura. Automated social skills trainer. Proc. IUI, pp. 17-27, Atlanta, USA, Mar. 2015. [Paper]
  261. M. Mizukami, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Linguistic individuality transformation for spoken language. Proc. IWSDS, 12 pages, Busan, South Korea, Jan. 2015. [Paper]
  262. F. Koto, S. Sakti, G. Neubig, T. Toda, M. Adriani, S. Nakamura. A study on natural expressive speech: automatic memorable spoken quote detection. Proc. IWSDS, 6 pages, Busan, South Korea, Jan. 2015. [Paper]
  263. T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Evaluation of a fully automatic cooperative persuasive dialogue system. Proc. IWSDS, 12 pages, Busan, South Korea, Jan. 2015. [Paper]
  264. T. Sasakura, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Unknown word detection based on event-related brain desynchronization responses. Proc. IWSDS, 6 pages, Busan, South Korea, Jan. 2015. [Paper]
  265. Y. Tsunomori, G. Neubig, S. Sakti, T. Toda, S. Nakamura. An analysis towards dialogue-based deception detection. Proc. IWSDS, 11 pages, Busan, South Korea, Jan. 2015. [Paper]
  266. H. Kawahara, M. Morise, T. Toda, H. Banno, R. Nisimura, T. Irino. Excitation source design for high-quality speech manipulation systems based on a temporally static group delay representation of periodic signals. Proc. APSIPA ASC, 10 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
  267. S. Sakti, Y. Odagaki, T. Sasakura, G. Neubig, T. Toda, S. Nakamura. An event-related brain potential study on the impact of speech recognition errors. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
  268. S. Tsuruta, K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An evaluation of target speech for a nonaudible murmur enhancement system in noisy environments. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
  269. K. Kobayashi, T. Toda, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. Gender-dependent spectrum differential models for perceived age control based on direct waveform modification in singing voice conversion. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
  270. L. Nio, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Recursive neural network paraphrase identification for example-based dialog retrieval. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
  271. K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An inter-speaker evaluation through simulation of electrolarynx control based on statistical F0 prediction. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
  272. R. Yoshida, T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Unnecessary utterance detection for avoiding digressions in discussion. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
  273. F. Koto, S. Sakti, G. Neubig, T. Toda, M. Adriani, S. Nakamura. The use of semantic and acoustic features for open-domain TED talk summarization. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
  274. S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Modulation spectrum-based post-filter for GMM-based voice conversion. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. <APSIPA ASC 2014 The Best Paper Award> [Paper]
  275. L. Nio, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Improving the robustness of example-based dialog retrieval using recursive neural network paraphrase identification. Proc. IEEE SLT, pp. 306-311, South Lake Tahoe, USA, Dec. 2014. [Paper]
  276. S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Modified post-filter to recover modulation spectrum for HMM-based speech synthesis. Proc. GlobalSIP, pp. 710-714, Atlanta, USA, Dec. 2014. [Paper]
  277. T. Toda. Augmented speech production based on real-time statistical voice conversion. Proc. GlobalSIP, pp. 755-759, Atlanta, USA, Dec. 2014 (Invited Talk). [Paper]
  278. Y. Hatakoshi, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Rule-based syntactic preprocessing for syntax-based machine translation. Proc. 8th Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), pp. 34-42, Doha, Qatar, Oct. 2014. [Paper]
  279. K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Direct F0 control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation. Proc. INTERSPEECH, pp. 31-35, MAX Atria, Singapore, Sep. 2014. [Paper]
  280. N. Jinbo, S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A hearing impairment simulation method using audiogram-based approximation of auditory characteristics. Proc. INTERSPEECH, pp. 490-494, MAX Atria, Singapore, Sep. 2014. [Paper]
  281. K. Kubo, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Structured soft margin confidence weighted learning for grapheme-to-phoneme conversion. Proc. INTERSPEECH, pp. 1263-1267, MAX Atria, Singapore, Sep. 2014. [Paper]
  282. S. Matsumiya, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus. Proc. INTERSPEECH, pp. 1801-1805, MAX Atria, Singapore, Sep. 2014. [Paper]
  283. H. Kawahara, M. Morise, T. Toda, H. Banno, R. Nisimura, T. Irino. Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation. Proc. INTERSPEECH, pp. 2243-2247, MAX Atria, Singapore, Sep. 2014. [Paper]
  284. P.L. Tobing, T. Toda, G. Neubig, S. Sakti, S. Nakamura, A. Purwarianti. Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models. Proc. INTERSPEECH, pp. 2298-2302, MAX Atria, Singapore, Sep. 2014. [Paper]
  285. K. Kobayashi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Statistical singing voice conversion with direct waveform modification based on the spectrum differential. Proc. INTERSPEECH, pp. 2514-2518, MAX Atria, Singapore, Sep. 2014. [Paper]
  286. D.Q. Truong, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Collection and analysis of a Japanese-English emphasized speech corpus. Proc. O-COCOSDA, pp. 77-82, Phuket, Thailand, Sep. 2014. [Paper]
  287. M. Mizukami, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Building a free, general-domain paraphrase database for Japanese. Proc. O-COCOSDA, pp. 129-133, Phuket, Thailand, Sep. 2014. [Paper]
  288. F. Koto, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Memorable spoken quote corpora of TED public speaking. Proc. O-COCOSDA, pp. 140-143, Phuket, Thailand, Sep. 2014. [Paper]
  289. L. Nio, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Conversation dialog corpora from drama television and movie scripts. Proc. O-COCOSDA, pp. 144-148, Phuket, Thailand, Sep. 2014. [Paper]
  290. K. Akabe, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Discriminative language models as a tool for machine translation error analysis. Proc. COLING, pp. 1124-1132, Dublin, Ireland, Aug. 2014. [Paper]
  291. T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Reinforcement learning of cooperative persuasive dialogue policies using framing. Proc. COLING, pp. 1706-1717, Dublin, Ireland, Aug. 2014. [Paper]
  292. Y. Oda, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Optimizing segmentation strategies for simultaneous speech translation. Proc. ACL, pp. 551-556, Baltimore, USA, June 2014. [Paper]
  293. H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Linguistic and acoustic features for automatic identification of autism spectrum disorders in children's narrative. Proc. Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 88-96, Baltimore, USA, June 2014. [Paper]
  294. H. Shimizu, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Collection of a simultaneous translation corpus for comperative analysis. Proc. LREC, pp. 670-673, Reykjavik, Iceland, May 2014. [Paper]
  295. S. Sakti, K. Kubo, S. Matsumiya, G. Neubig, T. Toda, S. Nakamura, F. Adachi, R. Isotani. Towards multilingual conversations in the medical domain: development of multilingual medical data and a network-based ASR system. Proc. LREC, pp. 2639-2643, Reykjavik, Iceland, May 2014. [Paper]
  296. S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A postfilter to modify the modulation spectrum in HMM-based speech synthesis. Proc. IEEE ICASSP, pp. 290-294, Florence, Italy, May 2014. <IEEE Signal Processing Society Japan Outstanding Student Conference Paper Award (recipient: Shinnosuke Takamichi)> [Paper]
  297. K. Kubo, S. Sakti, G. Neubig, T. Toda, S. Nakamura. NARROW adaptive regularization of weights for grapheme-to-phoneme conversion. Proc. IEEE ICASSP, pp. 2608-2612, Florence, Italy, May 2014. [Paper]
  298. K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement. Proc. IEEE ICASSP, pp. 4521-4525, Florence, Italy, May 2014. [Paper]
  299. K. Kobayashi, T. Toda, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. Regression approaches to perceptual age control in singing voice conversion. Proc. IEEE ICASSP, pp. 7954-7958, Florence, Italy, May 2014. [Paper]
  300. H.T. Vu, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Acquiring a dictionary of emotion-provoking events. Proc. EACL, pp. 128-132, Gothenburg, Sweden, Apr. 2014. [Paper]
  301. T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Construction and analysis of a persuasive dialogue corpus. Proc. IWSDS, pp. 213-223, Napa, USA, Jan. 2014. [Paper]
  302. N. Lubis, S. Sakti, G. Neubig, T. Toda, A. Purwarianti, S. Nakamura. Emotion and its triggers in human spoken dialogue: recognition and analysis. Proc. IWSDS, pp. 224-229, Napa, USA, Jan. 2014. [Paper]
  303. H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Modality and contextual differences in computer based non-verbal communication training. Proc. CogInfoCom, pp. 127-132, Budapest, Hungary, Dec. 2013. [Paper]
  304. H. Shimizu, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Constructing a apeech translation system using simultaneous interpretation data. Proc. IWSLT, 7 pages, Heidelberg, Germany, Dec. 2013. [Paper]
  305. S. Sakti, K. Kubo, G. Neubig, T. Toda S. Nakamura. The NAIST English speech recognition system for IWSLT 2013. Proc. IWSLT, 5 pages, Heidelberg, Germany, Dec. 2013. [Paper]
  306. T. Hiraoka, Y. Yamauchi, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Dialogue management for leading the conversation in persuasive dialogue systems. Proc. IEEE ASRU, pp. 114-119, Olomouc, Czech Republic, Dec. 2013. [Paper]
  307. H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Non-verbal communication training with an interactive multimedia application. Proc. ACE, Osaka, Japan, Oct. 2013. [Paper]
  308. Lasguido, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Combination of example-based and SMT-based approaches in a chat-oriented dialog system. Proc. ICE-ID, 6 pages, Bali, Indonesia, Oct. 2013. [Paper]
  309. G. Neubig, S. Sakti, T. Toda, S. Nakamura, Y. Matsumoto, R. Isotani, Y. Ikeda. Towards high-reliability speech translation in the medical domain. Proc. MedNLP-WS, 8 pages, Aichi, Japan, Oct. 2013. [Paper]
  310. P. Arthur, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Inter-sentence features and thresholded minimum error rate training: NAIST at CLEF 2013 QA4MRE. Proc. CLEF, 11 pages, Valencia, Spain, Sep. 2013. [Paper]
  311. T. Toda, H. Doi. Statistical voice conversion techniques for alaryngeal speech enhancement. Proc. SICE 2013, pp. 1602-1603, Aichi, Japan, Sep. 2013 (Invited Talk in Special Session). [Paper]
  312. T. Inukai, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Investigation of intra-speaker spectral parameter variation and its prediction towards improvement of spectral conversion metric. Proc. 8th ISCA Speech Synthesis Workshop (SSW8), pp. 89-94, Barcelona, Spain, Aug. 2013. [Paper]
  313. H. Kawahara, M. Morise, T. Toda, R. Nisimura, T. Irino. Beyond bandlimited sampling of speech spectral envelope imposed by the harmonic structure of voiced sounds. Proc. INTERSPEECH, pp. 34-38, Lyon, France, Aug. 2013. [Paper]
  314. S. Takamichi, T. Toda, Y. Shiga, S. Sakti, G. Neubig, S. Nakamura. Improvements to HMM-based speech synthesis based on parameter generation with rich context models. Proc. INTERSPEECH, pp. 364-368, Lyon, France, Aug. 2013. [Paper]
  315. K. Kobayashi, H. Doi, T. Toda, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. An investigation of acoustic features for singing voice conversion based on perceptual age. Proc. INTERSPEECH, pp. 1057-1061, Lyon, France, Aug. 2013. [Paper]
  316. H. Doi, T. Toda, T. Nakano, M. Goto, S. Nakamura. Evaluation of a singing voice conversion method based on many-to-many eigenvoice conversion. Proc. INTERSPEECH, pp. 1067-1071, Lyon, France, Aug. 2013. [Paper]
  317. K. Kubo, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Grapheme-to-phoneme conversion based on adaptive regularization of weight vectors. Proc. INTERSPEECH, pp. 1946-1950, Lyon, France, Aug. 2013. [Paper]
  318. T. Kano, S. Takamichi, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Generalizing continuous-space translation of paralinguistic information. Proc. INTERSPEECH, pp. 2614-2618, Lyon, France, Aug. 2013. [Paper]
  319. M. Ohgushi, G. Neubig, S. Sakti, T. Toda, S. Nakamura. An empirical comparison of joint optimization techniques for speech translation. Proc. INTERSPEECH, pp. 2619-2723, Lyon, France, Aug. 2013. [Paper]
  320. K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Hybrid approach to electrolaryngeal speech enhancement based on spectral subtraction and statistical voice conversion. Proc. INTERSPEECH, pp. 3067-3071, Lyon, France, Aug. 2013. [Paper]
  321. T. Moriguchi, T. Toda, M. Sano, H. Sato, G. Neubig, S. Sakti, S. Nakamura. A digital signal processor implementation of silent/electrolaryngeal speech enhancement based on real-time statistical voice conversion. Proc. INTERSPEECH, pp. 3072-3076, Lyon, France, Aug. 2013. [Paper]
  322. T. Fujita, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Simple, lexicalized choice of translation timing for simultaneous speech translation. Proc. INTERSPEECH, pp. 3487-3491, Lyon, France, Aug. 2013. [Paper]
  323. M. Itoi, R. Miyazaki, T. Toda, H. Saruwatari, K. Shikano. Blind speech extraction for non-audible murmur speech with speaker's movement noise. Proc. ISSPIT, 6 pages, Ho Chi Minh City, Vietnam, Dec. 2012. [Paper]
  324. A. Sani, S. Sakti, G. Neubig, T. Toda, A. Mulyanto, S. Nakamura. Towards language preservation: preliminary collection and vowel analysis of Indonesian ethnic speech data. Proc. Oriental COCOSDA, pp. 118-122, Macau, China, Dec. 2012. <Best Student Paper Award (recipient: Auliya Sani)> [Paper]
  325. G. Neubig, K. Duh, M. Ogushi, T. Kano, T. Kiso, S. Sakti, T. Toda, S. Nakamura. The NAIST machine translation system for IWSLT 2012. Proc. IWSLT, pp. 54-60, Hong Kong, China, Dec. 2012. [Paper]
  326. C. Saam, C. Mohr, K. Kilgour, M. Heck, M. Sperber, K. Kubo, S. Stueker, S. Sakti, G. Neubig, T. Toda, S. Nakamura, A. Waibel. The 2012 KIT and KIT-NAIST English ASR systems for the IWSLT evaluation. Proc. IWSLT, pp. 87-90, Hong Kong, China, Dec. 2012. [Paper]
  327. M. Heck, K. Kubo, M. Sperber, S. Sakti, S. Stueker, C. Saam, K. Kilgour, C. Mohr, G. Neubig, T. Toda, S. Nakamura, A. Waibel. The KIT-NAIST (contrastive) English ASR system for IWSLT 2012. Proc. IWSLT, pp. 91-95, Hong Kong, China, Dec. 2012. [Paper]
  328. T. Kano, S. Sakti, S. Takamichi, G. Neubig, T. Toda, S. Nakamura. A method for translation of paralinguistic information. Proc. IWSLT, pp. 158-163, Hong Kong, China, Dec. 2012. [Paper]
  329. H. Doi, T. Toda, T. Nakano, M. Goto, S. Nakamura. Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system. Proc. APSIPA ASC, 6 pages, Hollywood, USA, Nov. 2012. <APSIPA ASC 2012 The Best Paper Award (Short Paper in Regular Session Category)> [Paper]
  330. H. Tanaka, S. Sakti, G. Neubig, T. Toda, N. Campbell, S. Nakamura. Non-verbal cognitive skills and autistic conditions: an analysis and training tool. Proc. CogInfoCom, pp. 41-46, Kosice, Slovakia, Dec. 2012. [Paper]
  331. Lasguido, S. Sakti, G. Neubig, T. Toda, M. Adriani, S. Nakamura. Developing Non-Goal Dialog System based on Examples of Drama Television. Proc. IWSDS, pp. 315-320, Paris, France, Nov. 2012. [Paper]
  332. M. Kishimoto, T. Toda, H. Doi, S. Sakti, S. Nakamura. Model training using parallel data with mismatched pause positions in statistical esophageal speech enhancement. Proc. ICSP, pp. 590-594, Beijing, China, Oct. 2012 (Invited Talk in Special Session). [Paper]
  333. T. Toda, T. Muramatsu, H. Banno. Implementation of computationally efficient real-time voice conversion. Proc. INTERSPEECH, 4 pages, Portland, USA, Sep. 2012. [Paper]
  334. S. Takamichi, T. Toda, Y. Shiga, H. Kawai, S. Sakti, S. Nakamura. An evaluation of parameter generation methods with rich context models in HMM-based speech synthesis. Proc. INTERSPEECH, 4 pages, Portland, USA, Sep. 2012. [Paper]
  335. T. Toda. Statistical approaches to enhancement of body-conducted speech detected with non-audible murmur microphone. Proc. ICME CME, pp. 623-628, Hyogo, Japan, July 2012 (Invited Poster in Special Session). [Paper]
  336. K. Yamamoto, T. Toda, H. Doi, H. Saruwatari, K. Shikano. Statistical approach to voice quality control in esophageal speech enhancement. Proc. IEEE ICASSP, pp. 4497-4500, Kyoto, Japan, Mar. 2012. [Paper]
  337. S. Ishii, T. Toda, H. Saruwatari, S. Sakti, S. Nakamura. Blind noise suppression for non-audible murmur recognition with stereo signal processing. Proc. IEEE ASRU, pp. 494-499, Hawaii, USA, Dec. 2011. <Elected as Panel Member in "New Applications in Speech Processing" Session> [Paper]
  338. D. Deguchi, T. Toda, H. Doi, H. Saruwatari, K. Shikano. Computationally efficient body-conducted voice conversion with original excitation signals. Proc. APSIPA ASC, 4 pages, Xi'an, China, Oct. 2011. [Paper]
  339. N. Hattori, T. Toda, Hisashi Kawai, H. Saruwatari, K. Shikano. Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speech-to-speech translation. Proc. INTERSPEECH, pp. 2769-2772, Florence, Italy, Aug. 2011. [Paper]
  340. H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques. Proc. IEEE ICASSP, pp. 5136-5139, Prague, Czech Republic, May. 2011. [Paper]
  341. D. Babani, T. Toda, H. Saruwatari, K. Shikano. Acoustic model training for non-audible murmur recognition using transformed normal speech data. Proc. IEEE ICASSP, pp. 5224-5227, Prague, Czech Republic, May. 2011. [Paper]
  342. H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Speaking-aid systems based on one-to-many eigenvoice conversion for total laryngectomees. Proc. APSIPA ASC, pp. 498-501, Biopolis, Singapore, Dec. 2010. [Paper]
  343. D. Deguchi, H. Doi, T. Toda, H. Saruwatari, K. Shikano. Acoustic compensation method for accepting different recording devices in body-conducted voice conversion. Proc. APSIPA ASC, pp. 502-505, Biopolis, Singapore, Dec. 2010. [Paper]
  344. Y. Shiga, T. Toda, S. Sakai, H. Kawai. Improved training of excitation for HMM-based parametric speech synthesis. Proc. INTERSPEECH, pp. 809-812, Chiba, Japan, Sep. 2010. [Paper]
  345. K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion. Proc. INTERSPEECH, pp. 1628-1631, Chiba, Japan, Sep. 2010. [Paper]
  346. K. Ohta, T. Toda, Y. Ohtani, H. Saruwatari, K. Shikano. Adaptive voice-quality control based on one-to-many eigenvoice conversion. Proc. INTERSPEECH, pp. 2158-2161, Chiba, Japan, Sep. 2010. [Paper]
  347. Y. Shiga, T. Toda, S. Sakai, H. Kawai, K. Tokuda, M. Tsuzaki, S. Nakamura. The NICT Blizzard Challenge 2010 entry. Proc. Blizzard Challenge 2010 Workshop, 6 pages, Kyoto, Japan, Sep. 2010. [Paper]
  348. C. Hayashida, T. Toda, Y. Ohtani, H. Saruwatari, K. Shikano. Linear transformation approaches to many-to-one voice conversion. Proc. 7th ISCA Speech Synthesis Workshop (SSW7), pp. 74-79, Kyoto, Japan, Sep. 2010. [Paper]
  349. H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Statistical approach to enhancing esophageal speech based on Gaussian mixture models. Proc. IEEE ICASSP, pp. 4250-4253, Dallas, USA, Mar. 2010. <Best Student Paper Award (1st Place) (recipients: Hironori Doi and Keigo Nakamura)>[Paper]
  350. Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Non-parallel training for many-to-many eigenvoice conversion. Proc. IEEE ICASSP, pp. 4822-4825, Dallas, USA, Mar. 2010. [Paper]
  351. H. Zen, K. Oura, T. Nose, J. Yamagishi, S. Sako, T. Toda, T. Masuko, A.W. Black, K. Tokuda. Recent development of the HMM-based speech synthesis system (HTS). Proc. APSIPA ASC, pp. 121-130, Sapporo, Japan, Oct. 2009 (Invited Talk in Special Session). [Paper]
  352. H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Enhancement of esophageal speech using statistical voice conversion. Proc. APSIPA ASC, pp. 805-808, Sapporo, Japan, Oct. 2009. [Paper]
  353. T. Toda, K. Nakamura, T. Nagai, T. Kaino, Y. Nakajima, K. Shikano. Technologies for processing body-conducted speech detected with non-audible murmur microphone. Proc. INTERSPEECH, pp. 632-635, Brighton, UK, Sep. 2009 (Keynote in Special Session). [Paper]
  354. V.-A. Tran, G. Bailly, H. Loevenbruck, T. Toda. Multimodal HMM-based NAM-to-speech conversion. Proc. INTERSPEECH, pp. 656-659, Brighton, UK, Sep. 2009. [Paper]
  355. K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Electrolaryngeal speech enhancement based on statistical voice conversion. Proc. INTERSPEECH, pp. 1431-1434, Brighton, UK, Sep. 2009. [Paper]
  356. Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Many-to-many eigenvoice conversion with reference voice. Proc. INTERSPEECH, pp. 1623-1626, Brighton, UK, Sep. 2009. [Paper]
  357. M. Charlier, Y. Ohtani, T. Toda, A. Moinet, T. Dutoit. Cross-language voice conversion based on eigenvoices. Proc. INTERSPEECH, pp. 1635-1638, Brighton, UK, Sep. 2009. [Paper]
  358. R. Maia, T. Toda, K. Tokuda, S. Sakai, S. Nakamura. A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis. Proc. INTERSPEECH, pp. 1783-1786, Brighton, UK, Sep. 2009. [Paper]
  359. R. Maia, T. Toda, S. Sakai, Y. Shiga, J. Ni, H. Kawai, K. Tokuda, M. Tsuzaki, S. Nakamura. The NICT entry for the Blizzard Challenge 2009: an enhanced HMM-based speech synthesis system with trajectory training considering global variance and state-dependent mixed excitation. Proc. Blizzard Challenge 2009 Workshop, 6 pages, Edinburgh, UK, Sep. 2009. [Paper]
  360. T. Toda. Eigenvoice-based approach to voice conversion and voice quality control. Proc. NCMMSC, International Symposium, pp. 492-497, Lanzhou, China, Aug. 2009 (Invited Talk in Special Session). [Paper]
  361. K. Morizane, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Emphasized speech synthesis based on hidden Markov models. Proc. Oriental COCOSDA, 6 pages, O2-4, Beijing, China, Aug. 2009. [Paper]
  362. T. Toda, K. Nakamura, H. Sekimoto, K. Shikano. Voice conversion for various types of body transmitted speech. Proc. IEEE ICASSP, pp. 3601-3604, Taipei, Taiwan, Apr. 2009 (Invited Talk in Special Session). [Paper]
  363. K. Yu, T. Toda, M. Gasic, S. Keizer, F Mairesse, B. Thomson, S. Young. Probabilistic modelling of F0 in unvoiced regions in HMM based speech synthesis. Proc. IEEE ICASSP, pp. 3773-3776, Taipei, Taiwan, Apr. 2009. [Paper]
  364. D. Miyamoto, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Acoustic compensation methods for body transmitted speech conversion. Proc. IEEE ICASSP, pp. 3901-3904, Taipei, Taiwan, Apr. 2009. [Paper]
  365. T. Toda, S. Young. Trajectory training considering global variance for HMM-based speech synthesis. Proc. IEEE ICASSP, pp. 4025-4028, Taipei, Taiwan, Apr. 2009. [Paper]
  366. K. Oura, Y. Nankaku, T. Toda, K. Tokuda, R. Maia, S. Sakai, S. Nakamura. Simultaneous phrasing, prosody, and acoustic model training for Text-to-Speech conversion. Proc. ISCSLP, pp. 1-4, Kunming, China, Dec. 2008. <Best Student Paper Award (recipient: Keiichiro Oura)> [Paper]
  367. K. Yutani, Y. Uto, Y. Nankaku, T. Toda, K. Tokuda. Simultaneous conversion of duration and spectrum based on statistical models including time-sequence matching. Proc. INTERSPEECH, pp. 1072-1075, Brisbane, Australia, Sep. 2008. [Paper]
  368. T. Muramatsu, Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory. Proc. INTERSPEECH, pp. 1076-1079, Brisbane, Australia, Sep. 2008. [Paper]
  369. Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. An improved one-to-many eigenvoice conversion system. Proc. INTERSPEECH, pp. 1080-1083, Brisbane, Australia, Sep. 2008. [Paper]
  370. D. Tani, T. Toda, Y. Ohtani, H. Saruwatari, K. Shikano. Maximum a posteriori adaptation for many-to-one eigenvoice conversion. Proc. INTERSPEECH, pp. 1461-1464, Brisbane, Australia, Sep. 2008. [Paper]
  371. K. Nakamura, T. Toda, Y. Nakajima, H. Saruwatari, K. Shikano. Evaluation of speaking-aid system with voice conversion for laryngectomees toward its use in practical environments. Proc. INTERSPEECH, pp. 2209-2212, Brisbane, Australia, Sep. 2008. [Paper]
  372. R. Maia, J. Ni, S. Sakai, T. Toda, K. Tokuda, T. Shimizu, S. Nakamura. The NICT/ATR speech synthesis system for the Blizzard Challenge 2008. Proc. Blizzard Challenge 2008 Workshop, 6 pages, Brisbane, Australia, Sep. 2008. [Paper]
  373. J. Yamagishi, H. Zen, Y.-J. Wu, T. Toda, K. Tokuda. The HTS-2008 system: yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge. Proc. Blizzard Challenge 2008 Workshop, 6 pages, Brisbane, Australia, Sep. 2008. [Paper]
  374. V.-A. Tran, G. Bailly, H. Loevenbruck, T. Toda. Predicting F0 and voicing from NAM-captured whispered speech. Proc. Speech Prosody, 4 pages, Campinas, Brazil, May 2008. [Paper]
  375. T. Toda, K. Tokuda. Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM. Proc. IEEE ICASSP, pp. 3925-3928, Las Vegas, USA, Apr. 2008. [Paper]
  376. J. Yamagishi, T. Nose, H. Zen, T. Toda, K. Tokuda. Performance evaluation of the speaker-independent HMM-based speech synthesis system ``HTS-2007'' for the Blizzard Challenge 2007. Proc. IEEE ICASSP, pp. 3957-3960, Las Vegas, USA, Apr. 2008. [Paper]
  377. R. Maia, T. Toda, K. Tokuda, S. Sakai, S. Nakamura. On the state definition for a trainable excitation model in HMM-based speech synthesis. Proc. IEEE ICASSP, pp. 3965-3968, Las Vegas, USA, Apr. 2008. [Paper]
  378. W. Fujitsuru, H. Sekimoto, T. Toda, H. Saruwatari, K. Shikano. Bandwidth extension of cellular phone speech based on maximum likelihood estimation with GMM. Proc. NCSP, pp. 283-286, Gold Coast, Australia, Mar. 2008. [Paper]
  379. R. Gomez, T. Toda, H. Saruwatari, K. Shikano. Rapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection. Proc. INTERSPEECH, pp. 262-265, Antwerp, Belgium, Aug. 2007. [Paper]
  380. T. Cincarek, I. Shindo, T. Toda, H. Saruwatari, K. Shikano. Development of preschool children subsystem for ASR and Q&A in a real-environment speech-oriented guidance task. Proc. INTERSPEECH, pp. 1469-1472, Antwerp, Belgium, Aug. 2007. [Paper]
  381. R. Maia, T. Toda, H. Zen, Y. Nankaku, K. Tokuda. A trainable excitation model for HMM-based speech synthesis. Proc. INTERSPEECH, pp. 1909-1912, Antwerp, Belgium, Aug. 2007. [Paper]
  382. Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Speaker adaptive training for one-to-many eigenvoice conversion based on Gaussian mixture model. Proc. INTERSPEECH, pp. 1981-1984, Antwerp, Belgium, Aug. 2007. [Paper]
  383. K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Impact of various small sound source signals on voice conversion accuracy in speech communication aid for laryngectomees. Proc. INTERSPEECH, pp. 2517-2520, Antwerp, Belgium, Aug. 2007. [Paper]
  384. J. Ni, T. Hirai, H. Kawai, T. Toda, K. Tokuda, M. Tsuzaki, S. Sakai, R. Maia, S. Nakamura. ATRECSS - ATR English speech corpus for speech synthesis. Proc. Blizzard Challenge 2007 Workshop, 4 pages, Bonn, Germany, Aug. 2007. [Paper]
  385. J. Yamagishi, H. Zen, T. Toda, K. Tokuda. Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard Challenge 2007. Proc. Blizzard Challenge 2007 Workshop, 6 pages, Bonn, Germany, Aug. 2007. [Paper]
  386. S. Sakai, J. Ni, R. Maia, K. Tokuda, M. Tsuzaki, T. Toda, H. Kawai, S. Nakamura. Communicative speech synthesis with XIMERA. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 28-33, Bonn, Germany, Aug. 2007. [Paper]
  387. K. Ohta, Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Regression approaches to voice quality control based on one-to-many eigenvoice conversion. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 101-106, Bonn, Germany, Aug. 2007. [Paper]
  388. D. Tani, Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. An evaluation of many-to-one voice conversion algorithms with pre-stored speaker data sets. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 107-112, Bonn, Germany, Aug. 2007. [Paper]
  389. J. Yamagishi, T. Kobayashi, S. Renals, S. King, H. Zen, T. Toda, K. Tokuda. Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 125-130, Bonn, Germany, Aug. 2007. [Paper]
  390. R. Maia, T. Toda, H. Zen, Y. Nankaku, K. Tokuda. Excitation model for HMM-based speech synthesis based on residual modeling. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 131-136, Bonn, Germany, Aug. 2007. [Paper]
  391. Y. Nankaku, K. Nakamura, T. Toda, K. Tokuda. Spectral conversion based on statistical models including time-sequence matching. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 333-338, Bonn, Germany, Aug. 2007. [Paper]
  392. T. Toda, Y. Ohtani, K. Shikano. One-to-many and many-to-one voice conversion based on eigenvoices. Proc. IEEE ICASSP, pp. 1249-1252, Hawaii, USA, Apr. 2007 (Invited Talk in Special Session). [Paper]
  393. K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech. Proc. INTERSPEECH, pp. 1395-1398, Pittsburgh, USA, Sep. 2006. [Paper]
  394. T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Acoustic modeling for spoken dialogue systems based on unsupervised utterance-based selective training. Proc. INTERSPEECH, pp. 1722-1725, Pittsburgh, USA, Sep. 2006. [Paper]
  395. Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation. Proc. INTERSPEECH, pp. 2266-2269, Pittsburgh, USA, Sep. 2006. [Paper]
  396. M. Nakagiri, T. Toda, H. Kashioka, K. Shikano. Improving body transmitted unvoiced speech with statistical voice conversion. Proc. INTERSPEECH, pp. 2270-2273, Pittsburgh, USA, Sep. 2006. [Paper]
  397. Y. Uto, Y. Nankaku, T. Toda, A. Lee, K. Tokuda. Voice conversion based on mixtures of factor analyzers. Proc. INTERSPEECH, pp. 2278-2281, Pittsburgh, USA, Sep. 2006. [Paper]
  398. T. Toda, Y. Ohtani, K. Shikano. Eigenvoice conversion based on Gaussian mixture model. Proc. INTERSPEECH, pp. 2446-2449, Pittsburgh, USA, Sep. 2006. [Paper]
  399. H. Zen, T. Toda, K. Tokuda. The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. Proc. Blizzard Challenge 2006 Workshop, 4 pages, Pittsburgh, USA, Sep. 2006. [Paper]
  400. T. Toda, H. Kawai, T. Hirai, J. Ni, N. Nishizawa, J. Yamagishi, M. Tsuzaki, K. Tokuda, S. Nakamura. Developing a test bed of English Text-to-Speech system XIMERA for the Blizzard Challenge 2006. Proc. Blizzard Challenge 2006 Workshop, 4 pages, Pittsburgh, USA, Sep. 2006. [Paper]
  401. T. Kato, T. Toda, H. Saruwatari, K. Shikano. Transcription cost reduction for constructing acoustic models using acoustic likelihood selection criteria. Proc. LREC2006, pp. 789-792, Genoa, Italy, May. 2006. [Paper]
  402. T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Utterance-based selective training for cost-effective task-adaptation of acoustic models. Proc. SRIV2006, pp. 71-76, Toulouse, France, May. 2006. [Paper]
  403. K. Nakamura, T. Toda, Y. Nankaku, K. Tokuda. On the use of phonetic information for mapping from articulatory movements to vocal tract spectrum. Proc. IEEE ICASSP, pp. 93-96, Toulouse, France, May. 2006. [Paper]
  404. R. Gomez, T. Toda, H. Saruwatari, K. Shikano. Improving rapid unsupervised speaker adaptation based on HMM sufficient statistics. Proc. IEEE ICASSP, pp. 1001-1004, Toulouse, France, May. 2006. [Paper]
  405. T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Selective EM training of acoustic models based on sufficient statistics of single utterances. Proc. IEEE ASRU, pp. 168-173, San Juan, Puerto Rico, Nov. 2005. [Paper]
  406. H. Zen, T. Toda. An overview of Nitech HMM-Based speech synthesis system for Blizzard Challenge 2005. Proc. INTERSPEECH, pp. 93-96, Lisbon, Portugal, Sep. 2005. [Paper]
  407. T. Toda, K. Shikano. NAM-to-speech conversion with Gaussian mixture models. Proc. INTERSPEECH, pp. 1957-1960, Lisbon, Portugal, Sep. 2005. [Paper]
  408. T. Toda, K. Tokuda. Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. Proc. INTERSPEECH, pp. 2801-2804, Lisbon, Portugal, Sep. 2005. [Paper]
  409. T. Toda, A.W. Black, K. Tokuda. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter. Proc. IEEE ICASSP, Vol. 1, pp. 9-12, Philadelphia, USA, Mar 2005. [Paper]
  410. T. Toda, A.W. Black, K. Tokuda. Acoustic-to-articulatory inversion mapping with Gaussian mixture model. Proc. INTERSPEECH, pp. 1129-1132, Jeju, Korea, Oct. 2004. [Paper]
  411. T. Toda, A.W. Black, K. Tokuda. Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis. Proc. 5th ISCA Speech Synthesis Workshop (SSW5), pp. 31-36, Pittsburgh, USA, June 2004. [Paper]
  412. H. Kawai, T. Toda, J. Ni, M. Tsuzaki, K. Tokuda. XIMERA: a new TTS from ATR based on corpus-based technologies. Proc. 5th ISCA Speech Synthesis Workshop (SSW5), pp. 179-184, Pittsburgh, USA, June 2004. [Paper]
  413. K. Adachi, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano. Perceptual evaluation of quality deterioration owing to prosody modification. Proc. LREC2004, pp. 2159-2162, Lisbon, Portugal, May 2004. [Paper]
  414. T. Toda, H. Kawai, M. Tsuzaki. Optimizing sub-cost functions for segment selection based on perceptual evaluations in concatenative speech synthesis. Proc. IEEE ICASSP, pp. 657-660, Montreal, Canada, May 2004. [Paper]
  415. H. Kawai, T. Toda. An evaluation of automatic phone segmentation for concatenative speech synthesis. Proc. IEEE ICASSP, pp. 677-680, Montreal, Canada, May 2004. [Paper]
  416. T. Toda, H. Kawai, M. Tsuzaki. Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations. Proc. INTERSPEECH, pp. 297-300, Geneva, Switzerland, Sep. 2003. [Paper]
  417. T. Shiraishi, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano. Simple designing methods of corpus-based visual speech synthesis. Proc. INTERSPEECH, pp. 2241-2244, Geneva, Switzerland, Sep. 2003. [Paper]
  418. H. Kawanami, Y. Iwami, T. Toda, H. Saruwatari, K. Shikano. GMM-based voice conversion applied to emotional speech synthesis. Proc. INTERSPEECH, pp. 2401-2404, Geneva, Switzerland, Sep. 2003. [Paper]
  419. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. Segment selection considering local degradation of naturalness in concatenative speech synthesis. Proc. IEEE ICASSP, pp. 696-699, Hong Kong, China, Apr. 2003. [Paper]
  420. M. Mashimo, T. Toda, H. Kawanami, H. Kashioka, K. Shikano, N. Campbell. Evaluation of cross-language voice conversion using bilingual and non-bilingual databases. Proc. INTERSPEECH, pp. 293-296, Denver, USA, Sep. 2002. [Paper]
  421. H. Kawanami, T. Masuda, T. Toda, K. Shikano. Designing Japanese speech database covering wide range in prosody. Proc. INTERSPEECH, pp. 2425-2428, Denver, USA, Sep. 2002. [Paper]
  422. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. Perceptual evaluation of cost for segment selection in concatenative speech synthesis. Proc. IEEE 2002 Workshop on Speech Synthesis, 4 pages, Santa Monica, USA, Sep. 2002. [Paper]
  423. H. Kawanami, T. Masuda, T. Toda, K. Shikano. Designing speech database with prosodic variety for expressive TTS system. Proc. LREC2002, pp. 2039-2042, Las Palmas, Spain, May 2002. [Paper]
  424. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. Unit selection algorithm for Japanese speech synthesis based on both phoneme unit and diphone unit. Proc. IEEE ICASSP, pp. 465-468, Orlando, USA, May 2002. [Paper]
  425. T. Toda, H. Saruwatari, K. Shikano. High quality voice conversion based on Gaussian mixture model with dynamic frequency warping. Proc. INTERSPEECH, pp. 349-352, Aalborg, Denmark, Sep. 2001. [Paper]
  426. M. Mashimo, T. Toda, K. Shikano, N. Campbell. Evaluation of cross-language voice conversion based on GMM and STRAIGHT. Proc. INTERSPEECH, pp. 361-364, Aalborg, Denmark, Sep. 2001. [Paper]
  427. T. Toda, H. Saruwatari, K. Shikano. Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT wpectrum. Proc. IEEE ICASSP, pp. 841-844, Salt Lake City, USA, May 2001. [Paper]
  428. T. Toda, J. Lu, H. Saruwatari, K. Shikano. STRAIGHT-based voice conversion algorithm based on Gaussian mixture model. Proc. INTERSPEECH, pp. 279-282, Beijing, China, Oct. 2000. [Paper]
  429. T. Toda, J. Lu, S. Nakamura, K. Shikano. Voice conversion algorithm based on Gaussian mixture model applied to STRAIGHT. Proc. WESTPRAC VII, pp. 169-172, Kumamoto, Japan, Oct. 2000. [Paper]

Review Papers or Book Chapters

- 1. E. Cooper, W.-C. Huang, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. A review on subjective and objective evaluation of synthetic speech. Acoustical Science and Technology，Vol. 45, No. 4, pp. 161-183, July 2024. [Paper]
  2. K. Miyazaki, T. Toda, T. Hayashi, K. Takeda. Environmental sound processing and its applications. IEEJ Transactions on Electronics, Information and Systems, Vol. 14, No. 3, pp. 340-351, Mar. 2019. [Paper]
  3. K. Vijayan, H. Li, T. Toda. Speech-to-singing voice conversion: the challenges and strategies for improving vocal conversion processes. IEEE Signal Processing Magazine, Vol. 36, No. 1, pp. 95-102, Jan. 2019. [Link]
  4. K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, K. Oura. Speech synthesis based on hidden Markov models. Proceedings of the IEEE, Vol. 101, No. 5, pp. 1234-1252, May 2013. [Link]
  5. T. Toda. Modeling of speech parameter sequence considering global variance for HMM-based speech synthesis. Hidden Markov Models, Theory and Applications, Editor: Przemyslaw Dymarski, InTech, pp. 131-150, Apr. 2011 (ISBN 978-953-307-208-1). [Paper]

Invited Talks and Tutorials

- 1. T. Toda. Lessons learned from research in speech signal processing. Symposium on Speech and Behavior Informatics , Honolulu, USA, Dec. 2025. (Invited Talk)
  2. T. Toda. Personalized speech generation. APSIPA ASC, Panel session "Voice Privacy and Security," Singapore, Oct. 2025. (Perspective Talk)
  3. T. Toda. Recent advances and future directions in voice conversion. INTERSPEECH, Rotterdam, the Netherlands, Aug. 2025. (Survey Talk)
  4. T. Toda. Voice conversion techniques to separately control static and dynamic speech characteristics. Frontier Forum on Intelligent Speech Analysis and Generation, University of Science and Technology of China, Hefei, China, July 2024 (Invited Talk).
  5. T. Toda. Challenges in leveraging large models for augmented speech production. RASDAP, TCSDAP, Suzhou, China, Apr. 2024 (Invited Talk).
  6. T. Toda. Interactive voice conversion for augmented speech production. SNL, July 2021 (Invited Talk).
  7. T. Toda. Recent progress on voice conversion: what is next? IEEE SLT, Jan. 2021 (Invited Talk).
  8. T. Toda. Recent trend of voice conversion research and its possible future direction. APSIPA Distinguished Lecture in ROCLING (32nd Annual Conference on Computational Linguistics and Speech Processing), Taipei, Taiwan, Sep. 2020 (Keynote).
  9. T. Toda. Speech waveform modeling for advanced voice conversion. APSIPA Distinguished Lecture in Winter Seminar Series on Human Language Technology, National University of Singapore, Singapore, Dec. 2019.
  10. T. Toda. Speech waveform modeling for advanced voice conversion. APSIPA Distinguished Lecture, Carnegie Mellon University, Pittsburgh, USA, Oct. 2019.
  11. T. Toda, K. Kobayashi, T. Hayashi. Statistical voice conversion with direct waveform modeling. INTERSPEECH 2019, Graz, Austria, Sep. 2019 (Tutorial).
  12. T. Toda. Advanced voice conversion. Speech Processing Courses in Crete (SPCC), University of Crete, Heraklion, Greece, July 2019 (Invited Lecture).
  13. T. Toda. Hands on voice conversion. Speech Processing Courses in Crete (SPCC), University of Crete, Heraklion, Greece, July 2019 (Invited Lecture).
  14. T. Toda. Augmented vocal production towards new singing style development. Dagstuhl Seminar, Stimulus Talk at Seminar 19052: computational methods for melody and voice processing in music recordings, Wadern, Germany, Jan. 2019 (Invited Talk).
  15. T. Toda. Advanced voice conversion. Speech Processing Courses in Crete (SPCC), University of Crete, Heraklion, Greece, July 2018 (Invited Lecture).
  16. T. Toda. Hands on voice conversion. Speech Processing Courses in Crete (SPCC), University of Crete, Heraklion, Greece, July 2018 (Invited Lecture).
  17. T. Toda. Statistical voice conversion and its application to augmented speech production, Talk at FRIIS Seminar, Frontier Research Institute for Information Science, Nagoya Institute of Technology, Aichi, Japan, Nov. 2016 (Invited Talk).
  18. T. Toda. Voice conversion. Winter School on Speech and Audio Processing (WiSSAP 2013), IIT Madras, Chennai, India, Feb. 2013 (Invited Lecture).
  19. T. Toda. Statistical voice conversion and its real-time applications. Workshop on Frontiers in Speech and Language Technologies and Their Applications, University of Science and Technology of China, Hefei, China, Dec. 2012 (Invited Talk).
  20. T. Toda. Statistical approach to voice conversion and its applications for augmented human communication. The 8th International Symposium on Chinese Spoken Language Processing (ISCSLP-2012), Hong Kong, China, Dec. 2012 (Tutorial).
  21. T. Toda. General concepts and framework of HMM-based speech synthesis. Tutorial on HMM-based statistical speech synthesis in Workshop, Shanghai Jiao Tong University, Shanghai, China, Oct. 2012 (Tutorial).
  22. T. Toda. Voice conversion for enhancing various types of body-conducted speech detected with non-audible murmur microphone. Joint Meeting: 159th Meeting of the ASA and NOISE-CON 2010, Baltimore, USA, Apr. 2010 (Invited Talk).
  23. T. Toda. Statistical conversion of speech parameter trajectory for mapping between features of different modalities. Acoustics'08 Paris (the 2nd ASA-EAA joint conference), Paris, France, July 2008 (Invited Talk).
  24. T. Toda. Overview of voice conversion. 5th ISCA Speech Synthesis Workshop (SSW5), Pittsburgh, U.S.A., June 2004 (Tutorial).

Others

- 1. S. Chen, T. Toda, QHARMA-GAN: quasi-harmonic neural vocoder based on autoregressive moving average model. IEEE ICASSP, Presentation of an SPS Journal Paper, Barcelona, Spain May 2026.
  2. D. Ma, L.P. Violeta, K. Kobayashi, T. Toda, Pretraining and fine-tuning techniques for electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion. IEEE ICASSP, Presentation of an SPS Journal Paper, Barcelona, Spain May 2026.
  3. J. He, X. Shi, C.-H. Hu, J. Mi, X. Li, , T. Toda, M4SER: multimodal, multirepresentation, multitask, and multistrategy learning for speech emotion recognition. IEEE ICASSP, Presentation of an SPS Journal Paper, Barcelona, Spain May 2026.
  4. B.M. Halpern, T.B. Tienkamp, T. Rebernik, R.J.J.H. van Son, S.A.H.J. de Visscher, M.J.H. Witjes, D. Abur, , T. Toda, XPPG-PCA: reference-free automatic speech severity evaluation with principal components. IEEE ICASSP, Presentation of an SPS Journal Paper, Barcelona, Spain May 2026.
  5. L.P. Violeta, D. Ma, W.-C. Huang, T. Toda. Pretraining and adaptation techniques for electrolaryngeal speech recognition. EUSIPCO, Presentation of an SPS Journal Paper, Palermo, Italy, Sep. 2025.
  6. F. Li, F. Shen, D. Ma, J. Zhou, L. Wang, F. Fan, X. Chen, T. Toda, H. Niu. Mandarin speech reconstruction from neck and facial surface electromyography. IEEE EMBC, Research poster presentation, Copenhagen, Denmark, July 2025.
  7. T. Fujimura, I. Kuroyanagi, T. Toda. The NU systems for DCASE 2025 Challenge Task 2. Technical report, DCASE Task 2, 5 pages, July 2025.【DCASE 2025 Challenge Task 2 Judges' Award】
  8. T. Fujimura, I. Kuroyanagi, T. Toda. The NU systems for DCASE 2024 Challenge Task 2. Technical report, DCASE Task 2, 5 pages, July 2024.
  9. R. Yoneyama, Y.-C. Wu, T. Toda, High-fidelity and pitch-controllable neural vocoder based on unified source-filter networks. IEEE ICASSP, Presentation of an SPS Journal Paper, Apr. 2024.
  10. T. Fujimura, I. Kuroyanagi, T. Hayashi, T. Toda. Anomalous sound detection by end-to-end training of outlier exposure and normalizing flow with domain generalization techniques. Technical report, DCASE Task 2, 5 pages, July 2023.
  11. W.-C. Huang, S.-W. Yang, T. Hayashi, T. Toda, "A comparative study of self-supervised speech representation based voice conversion. IEEE ICASSP, Presentation of an SPS Journal Paper, Rhodes island, Greece, June 2023.
  12. Y. Yasuda, T. Toda. Investigation of Japanese Png BERT language model in text-to-speech synthesis for pitch accent language. IEEE ICASSP, Presentation of an SPS Journal Paper, Rhodes island, Greece, June 2023.
  13. I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Two-stage anomalous sound detection systems using domain generalization and specialization techniques. Technical report, DCASE Task 2, 5 pages, July 2022. <DCASE 2022 Challenge Task 2 Judges' Award>
  14. I. Kuroyanagi, T. Hayashi, Y. Adachi, T. Yoshimura, K. Takeda, T. Toda. Anomalous sound detection with ensemble of autoencoder and binary classification approaches. Technical report, DCASE Task 2, 4 pages, July 2021.
  15. C.-H. Hu, Y.-C. Wu, W.-C. Huang, Y.-H. Peng, Y.-W. Chen, P.-J. Ku, T. Toda, Y. Tsao, H.-M. Wang. The AS-NU system for the M2VoC challenge. Technical report, arXiv:2104.03009, 5 pages, Apr. 2021.
  16. K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Convolution-augmented Transformer for semi-supervised sound event detection. Technical report, DCASE Task 4, 4 pages, June 2020.
  17. S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Post-filter using modulation spectrum as a metric to quantify qver-smoothing effects in statistical parametric speech synthesis. APSIPA newsletter, No. 9, pp. 14-16, Apr. 2015.
  18. Y. Stylianou, T. Toda, C.-H. Wu, A. Kain, O. Rosec. Introduction to the special section on voice transformation. IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, No. 5, pp. 909-911, July 2010.
  19. T. Toda. Voice conversion and its application in speech-to-speech translation. Asian Forum on Information and Communications Technology (AFICT), Kuala Lumpur, Malaysia, Dec. 2009.
  20. T. Toda. Voice conversion (spectral conversion). Lecture, Speech: Phonetics, prosody, perception and synthesis, Language Technologies Institute, Carnegie Mellon University, Pittsburgh, U.S.A., Apr. 2004.
  21. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. Optimizing segment selection for high-quality Text-to-Speech. ATR Technical Report, TR-SLT-0033, Unpublished report, Mar. 2003.
  22. H. Kawanami, Y. Iwami, T. Toda, K. Shikano. Synthesizing emotional speech using voice conversion technique based on GMM with DFW and its evaluation. Demo presentation, IEEE 2002 Workshop on Speech Synthesis, Santa Monica, U.S.A., Sep. 2002.

[Tomoki Toda]

Page updated

Google Sites

Report abuse