CV
Tomoki Toda
Information Technology Center, Nagoya University
Furo-cho, Chikusa-ku, Nagoya, 464-8601, JAPAN
E-mail: tomoki__at__icts.nagoya-u.ac.jp
TEL: +81-52-789-4346
Web: https://sites.google.com/site/tomokitoda/home
Research interests
Tomoki Toda is interested in speech, music, and sound information processing. His research interests include statistical approaches to speech processing such as voice conversion, speech synthesis, speech analysis, speech recognition, and spoken dialogue, music processing such as music source separation and music signal generation, and sound information processing such as polyphonic sound event detection and sound event sympolization.
Keywords: speech and language processing, signal processing, machine learning
Education
Apr. 1995 - Mar. 1999
School of Engineering, Nagoya University, Japan
B.E. degree in Electrical and Electronic Engineering and Information Engineering, 1999
Apr. 1999 - Mar. 2003
Graduate School of Information Science, Nara Institute of Science and Technology, Japan
Master degree in engineering, 2001
Doctor degree in engineering, 2003
Professional Experience
Apr. 2003 - Mar. 2005
Japan Society for the Promotion of Science (JSPS)
Research Fellow (Affiliation: Nagoya Institute of Technology)
Apr. 2005 - Mar. 2011
Graduate School of Information Science, Nara Institute of Science and Technology (NAIST), Japan
Assistant Professor
Apr. 2011 - Aug. 2015
Graduate School of Information Science, NAIST, Japan
Associate Professor
Sep. 2015 - Present
Information Technology Center, Nagoya University, Japan
Professor
Mar. 2001 - Mar. 2003
Advanced Telecommunications Research Institute International (ATR), Spoken Language Translation Research Laboratories (SLT), Japan
Intern Researcher
Apr. 2003 - Sep. 2003
ATR-SLT, Japan
Visiting Researcher
Oct. 2004 - Mar. 2006
ATR, Spoken Language Communication Research Laboratories (SLC), Japan
Visiting Researcher
May 2006 - Present
National Institute of Information and Communications Technology (NICT), Knowledge Creating Communication Research Center, Japan
Visiting Researcher
July 2014 - Aug. 2015
Organization for Management and Outside Collaboration on R&D, National Institute of Informatics (NII), Japan
Visiting Associate Professor
Sep. 2015 - Mar. 2017
Graduate School of Information Science, NAIST, Japan
Visiting Professor
Sep. 2015 - Mar. 2018
Organization for Management and Outside Collaboration on R&D, NII, Japan
Visiting Professor
Dec. 2016 - Mar. 2020
Fundamental Information Technologies toward Innovative Social System Design, PRESTO, JST, Japan
PRESTO Researcher
Oct. 2003 - Sep. 2004
Language Technologies Institute, Carnegie Mellon University, USA
Visiting Researcher
Mar. 2008 - Aug. 2008
Department of Engineering, University of Cambridge, UK
Visiting Researcher
Professional Volunteer Work
Jan. 2007 - Dec. 2009
IEEE SPS Speech and Language Technical Committee Member
Apr. 2008 - Mar. 2012
IPSJ SIG-SLP Organizing Committee Member
Apr. 2010 - Dec. 2016
APSIPA Speech, Language, and Audio Technical Committee Member
Feb. 2011 - Jan. 2013
IEEE Signal Processing Society Kansai Chapter, Secretary
Mar. 2011 - Dec. 2013
ACM Transactions on Speech and Language Processing, Associate Editor
Feb. 2013 - Jan. 2015
IEEE Signal Processing Society Kansai Chapter, Treasurer
Apr. 2013 - Mar. 2017
IPSJ SIG-MUS Organizing Committee Member
Apr. 2013 - Mar. 2024
EURASIP Journal on Audio, Speech, and Music Processing, Associate Editor
May 2013 - May 2015
IEICE/ASJ Speech Research Committee, Secretary
June 2013 - July 2017
ASJ Editorial Committee, Associate Editor
Jan. 2014 - Dec. 2016
IEEE SPS Speech and Language Technical Committee Member
June 2015 - Present
ASJ Representative
Nov. 2016 - Dec. 2020
IEEE Signal Processing Letters, Associate Editor
Jan. 2019 - Jan. 2021
IEEE Signal Processing Society Tokyo Joint Chapter, Treasurer
June 2020 - June 2023
JASA Express Letters, Associate Editor
Dec. 2020 - Present
IEEE Signal Processing Letters, Senior Area Editor
Others
Guest Editorial Committee Members
IEEE Transactions on Audio, Speech and Language Processing, Special Issue on Voice Transformation, Guest Editor
IEICE Transactions on Information and Systems, Special Section on Recent Advances in Machine Learning for Spoken Language Processing, Guest Editor
IEICE Transactions on Information and Systems, Special Section on Advances in Modeling for Real-world Speech Information Processing and its Application, Guest Associate Editor
International Conference Committee Members
IEEE ICASSP 2012, Organizing Committee Member
INTERSPEECH 2010, Organizing Committee Member (Student Award)
INTERSPEECH 2014, Technical Program Committee Member, Coordinating Area Chair
IEEE 9th International Symposium on Wearable Computers (ISWC2005), Local Committee Member
APSIPA ASC 2009-2010, 2014-2015, Technical Program Committee Member
IEEE ASRU 2015, Organizing Committee Member, Regional Publicity Chair
IEEE ASRU 2017, Organizing Committee Member, Challenge Chair
The 7th ISCA Speech Synthesis Workshop (SSW7), Organizing Committee Member
The 5th ISCA Speech Synthesis Workshop (SSW5), Local Committee Member
International Workshop on Statistical Machine Learning for Speech Processing (IWSML), Organizing Committee Member, Local Chair
International Workshop on Machine Learning in Spoken Language Processing (MLSLP), Organizing Committee Member, Technical Program Chair
DSP in vehicles 2018, Organizing Committee Member, Program Chair
Speech Processing Courses in Crete (SPCC) 2019, 2020, Technical Committee Member
Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, Organizing Committee Member
Review Committee Members
IEEE ICASSP (2007-), INTERSPEECH (2006-)
EUSIPCO (2012, 2014-), APSIPA ASC (2009-)
ISCSLP (2014-), SLSP (2015-)
ISCA Speech Synthesis Workshop (SSW6-), DCASE (2017-)
IEEE ASRU (2011, 2017), IEEE SLT (2016-), IEEE WASPAA (2019-), IEEE MLSP (2019-), IEEE ICME (2021)
ISMIR (2019), AAAI (2021), IJCAI (2021-), NAACL-HLT (2007, 2016), COLING (2012, 2014)
Others
Several transactions
Session Chair
IEEE ICASSP 2007-2010, 2012, 2014, 2015, 2019-
INTERSPEECH 2006, 2009-2014, 2016-
EUSIPCO 2017, 2020
APSIPA ASC 2009, 2014, 2015, 2020, 2022-
IEEE SLT 2018, SSW 8th-9th, BC&VCC-WS 2020
Others
Other Activities
INTERSPEECH 2016, Voice Conversion Challenge 2016, Special Session Organizer
IEEE SLT 2018, Deep Leaning for Speech Synthesis, Special Session Organizer
Voice Conversion Challenge 2016, 2018, 2020, Organizer
APSIPA Distinguished Lecturer for 2019-2020
INTERSPEECH 2022, VoiceMOS Challenge, Special Session Organizer
VoiceMOS Challenge 2022, 2023, Organizer
Singing Voice Conversion Challenge 2023, Organizer
IEEE ASRU 2023, The Singing Voice Conversion Challenge 2023, Challenge Special Session Organizer
IEEE ASRU 2023, The VoiceMOS Challenge 2023, Challenge Special Session Organizer
Singing Voice Deepfake Detection Challenge 2024, Organizer
Research Grants
Apr. 2003 - Mar. 2005
JSPS, Grant-in-Aid for Scientific Research, Grant-in-Aid for JSPS Fellows
Apr. 2006 - Mar. 2009
MEXT, Grant-in-Aid for Scientific Research, Grant-in-Aid for Young Scientists (A)
May 2006 - Feb. 2007
IPA, Exploratory Software Project
Apr. 2008 - Mar. 2011
MIC, Strategic Information and Communications R&D Promotion Programme (SCOPE)
Apr. 2009 - Mar. 2011
JSPS, Japan-France Integrated Action Program (SAKURA)
Apr. 2010 - Mar. 2014
MEXT, Grant-in-Aid for Scientific Research, Grant-in-Aid for Young Scientists (A)
Dec. 2011 - July 2012
JST, Adaptable and Seamless Technology transfer Program through target-driven R&D (A-STEP), FS stage
Apr. 2014 - Mar. 2017
MEXT, Grant-in-Aid for Scientific Research, Grant-in-Aid for Scientific Research (B)
Apr. 2015 - Mar. 2019
MEXT, Grant-in-Aid for Scientific Research, Grant-in-Aid for challenging Exploratory Research
Dec. 2016 - Mar. 2020
JST, PRESTO, Fundamental Information Technologies toward Innovative Social System Design
Apr. 2017 - Mar. 2020
MEXT, Grant-in-Aid for Scientific Research, Grant-in-Aid for Scientific Research (B)
Oct. 2019 - Present
JST, CREST, Creation and development of core technologies interfacing human and information environments
Awards
The 18th TELECOM System Technology Award for Student from the Telecommunications Advancement Foundation (TAF) in 2003
The 23rd TELECOM System Technology Award from the TAF in 2008
The 2007 Information and Systems Society (ISS) Best Paper Award from the Institute of Electronics, Information and Communication Engineers (IEICE) in 2008
The 10th Ericsson Young Scientist Award from Nippon Ericsson K.K. in 2008
The 4th Itakura Prize Innovative Young Researcher Award from the Acoustical Society of Japan (ASJ) in 2009
The 26th Awaya Prize Young Researcher Award from the ASJ in 2009
The 2009 Young Author Best Paper Award from the IEEE Signal Processing Society in 2010
The 2010 ISS Young Researcher's Award in Speech Field from IEICE in 2011
The Best Paper Award (Short Paper in Regular Session Category) from APSIPA ASC 2012 in 2012
The 2012 Kiyasu Special Industrial Achievement Award from the IPSJ in 2013
The 2013 Best Paper Award (Speech Communication Journal) from EURASIP-ISCA in 2013
The Best Paper Award from APSIPA ASC 2014 in 2014
The Best Paper Award of the 21st Annual Meeting of the ANLP in 2015
The Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology, the Young Scientists' Prize in 2015
Paper Award of 2017 Annual Conference of AXIES in 2018
Poster Award of 2018 Annual Conference of AXIES in 2019
The Best Paper Award from APSIPA ASC 2021 in 2021
DCASE 2022 Challenge Task 2 Judges' Award in 2022
Memberships
The Institute of Electrical and Electronics Engineers, Inc. (IEEE), Senior member
Institute of Electronics, Information and Communication Engineers of Japan (IEICE), Member
Information Processing Society of Japan (IPSJ), Member
The Acoustical Society of Japan (ASJ), Member
International Speech Communication Association (ISCA), Member
The European Association for Signal Processing (EURASIP), Member
The Asia-Pacific Signal and Information Processing Association (APSIPA), Member
Publications
Journal Papers
S. Luan, Y. Wakabayashi, T. Toda. Unequally spaced sound field interpolation for rotation-robust beamforming. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 3185-3199, June 2024. [Paper]
L.P. Violeta, D. Ma, W.-C. Huang, T. Toda. Pretraining and adaptation techniques for electrolaryngeal speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 2777-2789, May 2024. [Paper]
M. Eshghi, T. Toda. An investigation of fundamental frequency pattern prediction for Japanese eelectrolaryngeal speech enhancement based on frame-wise phoneme representations. IEEE Access, Vol. 12, pp. 50137-50153, Apr. 2024. [Paper]
R. Wang, L. Li, T. Toda. Dual-channel target speaker extraction based on conditional variational autoencoder and directional information. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 1968-1979, Mar. 2024. [Paper]
H. Yamashita, T. Okamoto, R. Takashima, Y. Ohtani, T. Takiguchi, T. Toda, H. Kawai. Fast neural speech waveform generative models with fully-connected layer-based upsampling. IEEE Access, Vol. 12, pp. 31409-31421, Feb. 2024. [Paper]
C. Xie, T. Toda. Noisy-to-noisy voice conversion under variations of noisy condition. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 3871-3882, Oct. 2023. [Paper]
R. Yoneyama, Y.-C. Wu, T. Toda. High-fidelity and pitch-controllable neural vocoder based on unified source-filter networks. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 3717-3729, Oct. 2023. [Paper]
K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, H. Kawai. Harmonic-Net: fundamental frequency and speech rate controllable fast neural vocoder. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 1902-1915, May 2023. [Paper]
W.-C. Huang, S.-W. Yang, T. Hayashi, T. Toda. A comparative study of self-supervised speech representation based voice conversion. IEEE Journal of Selected Topics in signal Processing, Vol. 16, No. 6, pp. 1308-1318, Oct. 2022. [Preprint]
Y. Yasuda, T. Toda. Investigation of Japanese Png BERT language model in text-to-speech synthesis for pitch accent language. IEEE Journal of Selected Topics in signal Processing, Vol. 16, No. 6, pp. 1319-1328, Oct. 2022. [Paper]
Y.-C. Wu, P.L. Tobing, K. Yasuhara, N. Matsunaga, Y. Ohtani, T. Toda, Y. Shiga, H. Kawai. A cyclical approach to synthetic and natural speech mismatch refinement of neural post-filter for low-cost text-to-speech system. APSIPA Transactions on Signal and Information Processing, Vol. 11, e30, pp. 1-32, Sep. 2022. [Paper]
T. Okamoto, K. Matsubara, T. Toda, Y. Shiga, H. Kawai. Neural speech-rate conversion with multispeaker WaveNet vocoder. Speech Communication, Vol. 138, pp. 1-12, Mar. 2022. [Paper]
K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, Y. Shiga, H. Kawai. Full-band LPCNet: a real-time neural vocoder for 48 kHz audio with a CPU. IEEE Access, Vol. 9, pp. 94923-94933, July 2021. [Paper]
A. Ando, T. Mori, S. Kobashikawa, T. Toda. Speech emotion recognition based on listener-dependent emotion perception models. APSIPA Transactions on Signal and Information Processing, Vol. 10, e6, pp. 1-11, Apr. 2021. [Paper]
Y.-C. Wu, T. Hayashi, P.L. Tobing, K. Kobayashi, T. Toda. Quasi-periodic WaveNet: an autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 1134-1148, Mar. 2021. [Paper]
Y.-C. Wu, T. Hayashi, T. Okamoto, H. Kawai, T. Toda. Quasi-periodic parallel WaveGAN: a non-autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 792-806, Feb. 2021. [Paper]
W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, T. Toda. Pretraining techniques for sequence-to-sequence voice conversion. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 745-755, Feb. 2021. <IEEE Signal Processing Society Japan Student Best Paper Award (recipient: Wen-Chin Huang)> [Paper]
H. Kameoka, W.-C. Huang, K. Tanaka, T. Kaneko, N. Hojo, T. Toda. Many-to-many voice transformer network. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 656-670, Jan. 2021. [Paper]
P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda. An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder. APSIPA Transactions on Signal and Information Processing, Vol. 9, e26, pp. 1-14, Nov. 2020. [Paper]
X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V. Vestman, T. Kinnunen, K.A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. Wang, S. Le Maguer, M. Becker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Onuma, K. Mushika, T. Kaneda, Y. Jiang, L.-J. Liu, Y.-C. Wu, W.-C. Huang, T. Toda, K. Tanaka, H. Kameoka, I. Steiner, D. Matrouf, J.-F. Bonastre, A. Govender, S. Ronanki, J.-X. Zhang, Z.-H. Ling. ASVspoof 2019: a large-scale public database of synthetic, converted and replayed speech. Computer Speech and Language, Vol. 64, Article 101114, 25 pages, Nov. 2020. [Paper]
Y.-C. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda. Non-parallel voice conversion system with WaveNet vocoder and collapsed speech suppression. IEEE Access, Vol. 8, No. 1, pp. 62094-62106, Apr. 2020. [Paper]
S. Ohira, S. Seiya, R. Ito, K. Okamoto, U. Tanikawa, D. Deguchi, T. Toda. Development and Evaluation of "KamiRepo" Web Service with Return of Handwritten Assignments via LMS. IPSJ Transactions on Computers and Education (Japanese Edition), Vol. 6, No. 1, pp. 52-68, Feb. 2020. [Paper]
A. Ando, R. Masumura, H. Kamiyama, S. Kobashikawa, Y. Aono, T. Toda. Customer satisfaction estimation in contact center calls based on a hierarchical multi-task model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 28, No. 1, pp. 715-728, Jan. 2020. [Link]
P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda. Voice conversion with CycleRNN-based spectral mapping and finly tuned WaveNet vocoder. IEEE Access, Vol. 7, No. 1, pp. 171114-171125, Dec. 2019. [Paper]
S. Seki, H. Kameoka, L. Li, T. Toda, K. Takeda. Underdetermined source separation based on generalized multichannel variational autoencoder. IEEE Access, Vol. 7, No. 1, pp. 168104-168115, Nov. 2019. [Paper]
A. Tamamori, T. Hayashi, T. Toda, K. Takeda. Daily activity recognition based on recurrent neural network using multi-modal signals. APSIPA Transactions on Signal and Information Processing, Vol. 7, e21, pp. 1-11, Dec. 2018. [Paper]
T. Kano, S. Takamichi, S. Sakti, G. Neubig, T. Toda, S. Nakamura. An end-to-end model for cross-lingual transformation of paralinguistic information. Machine Translation, Vol. 32, No. 4, pp. 353-368, Dec. 2018. [Link]
S. Seki, T. Toda, K. Takeda. Stereophonic music separation based on non-negative tensor factorization with cepstral distance regularization. IEICE Transactions on Fundamentals, Vol. E101-A, No. 7, pp. 1057-1064, July 2018. [Link]
K. Kobayashi, T. Toda, S. Nakamura. Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential. Speech Communication, Vol. 99, pp. 211-220, May 2018. [Paper]
T. Hayashi, M. Nishida, N. Kitaoka, T. Toda, K. Takeda. Daily activity recognition with large-scaled real-life recording datasets based on deep neural network using multi-modal signals. IEICE Transactions on Fundamentals, Vol. E101-A, No. 1, pp. 199-210, Jan. 2018. [Link]
P.L. Tobing, K. Kobayashi, T. Toda. Articulatory controllable speech modification based on statistical inversion and production mappings. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 12, pp. 2337-2350, Dec. 2017. [Paper]
T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda. Duration-controlled LSTM for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 11, pp. 2059-2070, Nov. 2017. <IEEE Signal Processing Society Japan Young Author Best Paper Award (recipient: Tomoki Hayashi)> [Paper]
K. Tanaka, T. Toda, S. Nakamura. A vibration control method of an electrolarynx based on statistical F0 pattern prediction. IEICE Transactions on Information and Systems, Vol. E100-D, No. 9, pp. 2165-2173, Sep. 2017. [Paper]
Q. Truong Do, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Preserving word-level emphasis in speech-to-speech translation. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 25, No. 3, pp. 544-556, Mar. 2017. <IEEE Signal Processing Society Japan Student Best Paper Award (recipient: Quoc Truong Do)> [Link]
A. Miura, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Improving pivot translation by remembering the pivot. Journal of Natural Language Processing (Japanese Edition), Vol. 23, No. 5, pp. 499-528, Dec. 2016. [Paper]
Y. Oshima, S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Non-native text-to-speech preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. IEICE Transactions on Information and Systems, Vol. E99-D, No. 12, pp. 3132-3139, Dec. 2016. [Paper]
K. Kobayashi, T. Toda, T. Nakano, M. Goto, S. Nakamura. Improvements of voice timbre control based on perceived age in singing voice conversion. IEICE Transactions on Information and Systems, Vol. E99-D, No. 11, pp. 2767-2777, Nov. 2016. [Paper]
T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Learning cooperative persuasive dialogue policies using framing. Speech Communication, Vol. 84, pp. 83-96, Nov. 2016. [Link]
S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A statistical sample-based approach to GMM-based voice conversion using tied-covariance acoustic models. IEICE Transactions on Information and Systems, Vol. E99-D, No. 10, pp. 2490-2498, Oct. 2016. [Paper]
H. Tanaka, S. Sakti, G. Neubig, T. Toda, H. Negoro, H. Iwasaka, S. Nakamura. Teaching social communication skills through human-agent interaction.. ACM Transactions on Interactive Intelligent Systems, Vol. 6, No. 2, 23 pages, Aug. 2016. [Link]
H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Enhancing event-related potentials based on maximum a posteriori estimation with a spatial correlation prior. IEICE Transactions on Information and Systems, Vol. E99-D, No. 6, pp. 1410-1419, June 2016. [Paper]
S. Takamichi, T. Toda, A.W. Black, G. Neubig, S. Sakti, S. Nakamura. Post-filters to modify the modulation spectrum for statistical parametric speech synthesis. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No. 4, pp. 755-767, Apr. 2016. <IEEE Signal Processing Society Japan Young Author Best Paper Award (recipient: Shinnosuke Takamichi)> [Paper]
Z. Wu, P. De Leon, C. Demiroglu, A. Khodabakhsh, S. King, Z.-H. Ling, D. Saito, B. Stewart, T. Toda, M. Wester, J. Yamagishi. Anti-spoofing for text-independent speaker verification: an initial database, comparison of countermeasures, and human performance. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No. 4, pp. 768-783, Apr. 2016. [Link]
K. Akabe, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Error selection methods for machine translation error analysis. Journal of Natural Language Processing (Japanese Edition), Vol. 23, No. 1, pp. 88-117, Jan. 2016. [Paper]
M. Mizukami, L. Nio, H. Kimura, T. Nomura, G. Neubig, K. Yoshino, S. Sakti, T. Toda, S. Nakamura. Example based dialogue system based on satisfaction prediction. Transactions of the Japanese Society for Artificial Intelligence (Japanese Edition), Vol. 31, No. 1, 12 pages, Jan. 2016. [Paper]
P. Arthur, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Semantic parsing of ambiguous input through paraphrasing and verification. Transactions of the Association for Computational Linguistics, Vol. 3, pp. 571-584, Dec. 2015. [Link]
H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. NOCOA+: multimodal computer-based training for social and communication skills. IEICE Transactions on Information and Systems, Vol. E98-D, No. 8, pp. 1536-1544, Aug. 2015. [Paper]
K. Kobayashi, T. Toda, H. Doi, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. Voice timbre control based on perceived age in singing voice conversion. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1419-1428, June 2014. [Paper]
K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A hybrid approach to electrolaryngeal speech enhancement based on noise reduction and statistical excitation generation. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1429-1437, June 2014. [Paper]
K. Kubo, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Structured adaptive regularization of weight vectors for a robust grapheme-to-phoneme conversion model. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1468-1476, June 2014. [Paper]
L. Nio, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Utilizing human-to-human conversation examples for a multi domain chat-oriented dialog system. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1497-1505, June 2014. [Paper]
S. Takamichi, T. Toda, Y. Shiga, S. Sakti, G. Neubig, S. Nakamura. Parameter generation methods with rich context models for high-quality and flexible text-to-speech synthesis. IEEE Journal of Selected Topics in Signal Processing, Vol. 8, No. 2, pp. 239-250, Apr. 2014. <The 30th TELECOM System Technology Award for Student from TAF (recipient: Shinnosuke Takamichi)> <IEEE Kansai Section Student Paper Award (recipient: Shinnosuke Takamichi)> [Link]
H. Doi, T. Toda, K. Nakamura, H. Saruwatari, K. Shikano. Alaryngeal speech enhancement based on one-to-many eigenvoice conversion. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 22, No. 1, pp. 172-183, Jan. 2014. [Paper]
Y. Yamauchi, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Answer sentence generation using relationships between terms for guiding users to new topics in dialog systems. Transactions of the Japanese Society for Artificial Intelligence (Japanese Edition), Vol. 29, No. 1, pp. 80-89, Jan. 2014. [Paper]
T. Toda, M. Nakagiri, K. Shikano. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, Vol. 20, No. 9, pp. 2505-2517, Sep. 2012. [Paper]
T. Nakamura, K. Sugiura, T. Nagai, N. Iwahashi, T. Toda, H. Okada, T. Omori. Learning novel objects for extended mobile manipulation. Journal of Intelligent and Robotic Systems, Vol. 66, No. 1-2, pp. 187-204, Apr. 2012. [Link]
T. Nakamura, M. Attamimi, K. Sugiura, T. Nagai, N. Iwahashi, T. Toda, H. Okada, T. Omori. An extended mobile manipulation robot learning novel objects. Journal of the Robotics Society of Japan, Vol. 30, No. 2, pp. 213-224, Mar. 2012. [Paper]
T. Kubo, T. Toda, M. Yoshida, T. Hattori, K. Ikeda. Vowel recognition based on surface electromyography with electrode grid on submental region. Transactions of Japanese Society for Medical and Biological Engineering, Vol. 50, No. 1, pp. 38-46, Feb. 2012. [Paper]
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Communication, Vol. 54, No. 1, pp. 134-146, Jan. 2012. [Link]
H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Esophageal speech enhancement based on statistical voice conversion with Gaussian mixture models. IEICE Transactions on Information and Systems, Vol. E93-D, No. 9, pp. 2472-2482, Sep. 2010. [Paper]
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Improvements of the one-to-many eigenvoice conversion system. IEICE Transactions on Information and Systems, Vol. E93-D, No. 9, pp. 2491-2499, Sep. 2010. [Paper]
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Evaluation of extremely small sound source signals used in speaking-aid system with statistical voice conversion. IEICE Transactions on Information and Systems, Vol. E93-D, No. 7, pp. 1909-1917, July 2010. [Paper]
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Adaptive training for voice conversion based on eigenvoices. IEICE Transactions on Information and Systems, Vol. E93-D, No. 6, pp. 1589-1598, June 2010. [Paper]
T. Hirahara, M. Otani, S. Shimizu, T. Toda, K. Nakamura, Y. Nakajima, K. Shikano. Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech Communication, Vol. 52, No. 4, pp. 301-313, Apr. 2010. [Link]
V.-A. Tran, G. Bailly, H. Loevenbruck, T. Toda. Improvement to a NAM-captured whisper-to-speech system. Speech Communication, Vol. 52, No. 4, pp. 314-326, Apr. 2010. [Link]
J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, S. Renals. Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Transactions on Audio, Speech and Language Processing, Vol. 17, No. 6, pp. 1208-1230, Aug. 2009. [Link]
R. Gomez, T. Toda, H. Saruwatari, K. Shikano. Techniques in rapid unsupervised speaker adaptation based on HMM-sufficient statistics. Speech Communication, Vol. 51, No. 1, pp. 42-57, Jan. 2009. [Link]
H. Zen, T. Toda, K. Tokuda. The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. IEICE Transactions on Information and Systems, Vol. E91-D, No. 6, pp. 1764-1773, June 2008. [Paper]
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Maximum likelihood voice conversion based on Gaussian mixture model with STRAIGHT mixed excitation. IEICE Transactions on Information and Systems (Japanese Edition), Vol. J91-D, No. 4, pp. 1082-1091, Apr. 2008. [Paper]
T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Cost reduction of acoustic modeling for real-environment applications using unsupervised and selective training. IEICE Transactions on Information and Systems, Vol. E91-D, No. 3, pp. 499-507, Mar. 2008. [Paper]
G. Nagino, M. Shozakai, T. Toda, H. Saruwatari, K. Shikano. Building an effective speech corpus by utilizing statistical multidimensional scaling method. IEICE Transactions on Information and Systems, Vol. E91-D, No. 3, pp. 607-614, Mar. 2008. [Paper]
T. Toda, A.W. Black, K. Tokuda. Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Communication, Vol. 50, No. 3, pp. 215-227, Mar. 2008. <The 2013 Best Paper Award (Speech Communication Journal) from EURASIP-ISCA> [Paper]
T. Toda, A.W. Black, K. Tokuda. Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 8, pp. 2222-2235, Nov. 2007. <The 2009 Young Author Best Paper Award from the IEEE Signal Processing Society> [Paper]
T. Toda, K. Tokuda. A Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions on Information and Systems, Vol. E90-D, No. 5, pp. 816-824, May 2007. <The 23rd TELECOM System Technology Award from the TAF> <The 2007 ISS Best Paper Award from the IEICE> [Paper]
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. A speaking communication aid system for total laryngectomees using voice conversin of body transmitted artificial speech. IEICE Transactions on Information and Systems (Japanese Edition), Vol. J90-D, No. 3, pp. 780-787, Mar. 2007. [Paper]
R. Gomez, T. Toda, H. Saruwatari, K. Shikano. Reducing computation time of the rapid unsupervised speaker adaptation based on HMM-sufficient statistics. IEICE Transactions on Information and Systems, Vol. E90-D, No. 2, pp. 554-561, Feb. 2007. [Paper]
H. Zen, T. Toda, M. Nakamura, K. Tokuda. Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Transactions on Information and Systems, Vol. E90-D, No. 1, pp. 325-333, Jan. 2007. <The 23rd TELECOM System Technology Award from the TAF> <The 2007 ISS Best Paper Award from the IEICE> [Paper]
H. Kawai, T. Toda, J. Yamagishi, T. Hirai, J. Ni, N. Nishizawa, M. Tsuzaki, K. Tokuda. XIMERA: a concatenative speech synthesis system with large scale corpora. IEICE Transactions on Information and Systems (Japanese Edition), Vol, J89-D-II, No. 12, pp. 2688-2698, Dec. 2006. [Paper]
T. Hirai, H. Kawai, M. Tsuzaki, T. Toda. Analysis of naturalness degradation factors in speech synthesis system XIMERA for Japanese. Journal of the Acoustical Society of Japan in Japanese, Vol. 62, No. 11, pp. 767-773, Nov. 2006. [Paper]
T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Utterance-based selective training for the automatic creation of task-dependent acoustic models. IEICE Transactions on Information and Systems, Vol. E89-D, No. 3, pp. 962-969, Mar. 2006. [Paper]
R. Gomez, A. Lee, T. Toda, H. Saruwatari, K. Shikano. Improving rapid unsupervised speaker adaptation based on HMM-sufficient statistics in noisy environments using multi-template models. IEICE Transactions on Information and Systems, Vol. E89-D, No. 3, pp. 998-1005, Mar. 2006. [Paper]
T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis. Speech Communication, Vol. 48, No. 1, pp. 45-56, Jan. 2006. [Paper]
K. Adachi, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano. Designing target cost function based on prosody of speech database. IEICE Transactions on Information and Systems, Vol. E88-D, No. 3, pp. 519-524, Mar. 2005. [Paper]
T. Masuda, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano. Speech databases with various prosody and its evaluation on speech rate. IEICE Transactions on Information and Systems (Japanese Edition), Vol. J87-D-II, No. 2, pp. 447-455, Feb. 2004. [Paper]
T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. A segment selection algorithm for Japanese concatenative speech synthesis based on both phoneme unit and diphone unit. IEICE Transactions on Information and Systems (Japanese Edition), Vol. J85-D-II, No. 12, pp. 1760-1770, Dec. 2002. [Paper]
M. Mashimo, T. Toda, H. Kawanami. K. Shikano, N. Campbell. Cross-language voice conversion evaluation using bilingual databases. IPSJ Journal, Vol. 43, No. 7, pp. 2177-2185, July 2002. [Paper]
T. Toda, J. Lu, H. Saruwatari, K. Shikano. Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping. IEICE Transactions on Information and Systems (Japanese Edition), Vol. J84-D-II, No. 10, pp. 2181-2189, Oct. 2001. <The 18th TELECOM System Technology Award for Student from TAF> [Paper]
T. Toda, H. Banno, S. Kajita, K. Takeda, F. Itakura, K. Shikano. Improvement of STRAIGHT method under noisy conditions based on lateral inhibitive weighting. IEICE Transactions on Information and Systems (Japanese Edition), Vol. J83-D-II, No. 11, pp. 2180-2189, Nov. 2000. [Paper]
Letters
K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, H. Kawai. Comparison of real-time multi-speaker neural vocoders on CPUs . Acoustical Science and Technology, Acoustical Letter, Vol. 43, No. 2, pp. 121-124, Mar. 2022.
K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, Y. Shiga, H. Kawai. Investigation of training data size for real-time neural vocoders on CPUs. Acoustical Science and Technology, Acoustical Letter, Vol. 42, No. 1, pp. 65-68, Jan. 2021.
T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, H. Kawai. Deep neural network-based power spectrum reconstruction to improve quality of vocoded speech with limited acoustic parameters. Acoustical Science and Technology, Acoustical Letter, Vol. 39, No. 2, pp. 163-166, Mar. 2018.
H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. NOCOA: A Computer-Based Training Tool for Social and Communication Skills That Exploits Non-verbal Behaviors. The Journal of Information and Systems in Education (Short Note), Vol. 12, No. 1, pp. 19-26, Apr. 2013.
International Conferences
D. Ma, Y. Choi, F. Li, C. Xie, K. Kobayashi, T. Toda. Robust sequence-to-sequence voice conversion for electrolaryngeal speech enhancement in noisy and reverberant conditions. Proc. IEEE EMBC, 4 pages, Orlando, USA, July 2024. [Paper]
F. Li, F. Shen, D. Ma, S. Zhang, J. Zhou, L. Wang, F. Fan, T. Liu, X. Chen, T. Toda, H. Niu. Mandarin speech reconstruction from tongue motion ultrasound images based on generative adversarial networks. Proc. IEEE EMBC, 4 pages, Orlando, USA, July 2024. [Paper]
T. Komatsu, Y. Fujita, K. Takeda, T. Toda. Audio difference learning for audio captioning. IEEE ICASSP, pp. 1456-1460, Seoul, Korea, Apr. 2024. [Paper]
Y. Ohtani, T. Okamoto, T. Toda, H. Kawai. FIRNET: fundamental frequency controllable fast neural vocoder with trainable finite impulse response filter. IEEE ICASSP, pp. 10871-10875, Seoul, Korea, Apr. 2024. [Paper]
L.P. Violeta, W.-C. Huang, D. Ma, R. Yamamoto, K. Kobayashi, T. Toda. Electrolaryngeal speech intelligibility enhancement through robust linguistic encoders. IEEE ICASSP, pp. 10961-10965, Seoul, Korea, Apr. 2024. [Paper]
J. He, X. Shi, X. Li, T. Toda. MF-AED-AEC: speech emotion recognition by leveraging multimodal fusion, ASR error detection, and ASR error correction. IEEE ICASSP, pp. 11066-11070, Seoul, Korea, Apr. 2024. [Paper]
T. Okamoto, Y. Ohtani, T. Toda, H. Kawai. ConvNeXt-TTS and ConvNeXt-VC: ConvNeXt-based fast end-to-end sequence-to-sequence text-to-speech and voice conversion. IEEE ICASSP, pp. 12456-12460, Seoul, Korea, Apr. 2024. [Paper]
W.-C. Huang, L.P. Violeta, S. Liu, J. Shi, T. Toda. The Singing Voice Conversion Challenge 2023. Proc. IEEE ASRU, 8 pages, Taipei, Taiwan, Dec. 2023. [Paper]【Selected as Top 3% Papers】
J. He, Z. Yang, T. Toda. ED-CEC: improving rare word recognition using ASR post-processing based on error detection and context-aware error correction. Proc. IEEE ASRU, 6 pages, Taipei, Taiwan, Dec. 2023. [Paper]
B. Halpern, W.-C. Huang, L.P. Violeta, R. van Son, T. Toda. Improving severity preservation of healthy-to-pathological voice conversion with global style tokens. Proc. IEEE ASRU, 7 pages, Taipei, Taiwan, Dec. 2023. [Paper]
R. Yamamoto, R. Yoneyama, L.P. Violeta, W.-C. Huang, T. Toda. A comparative study of voice conversion models with large-scale speech and singing data: the T13 systems for the Singing Voice Conversion Challenge 2023. Proc. IEEE ASRU, 6 pages, Taipei, Taiwan, Dec. 2023. [Paper]
E. Cooper, W.-C. Huang, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. The VoiceMOS Challenge 2023: zero-shot subjective speech quality prediction for multiple domains. Proc. IEEE ASRU, 7 pages, Taipei, Taiwan, Dec. 2023. [Paper]【Selected as Top 3% Papers】
T. Okamoto, H. Yamashita, Y. Ohtani, T. Toda, H. Kawai. WaveNeXt: ConvNeXt-based fast neural vocoder without iSTFT layer. Proc. IEEE ASRU, 8 pages, Taipei, Taiwan, Dec. 2023. [Paper]
S. Kim, K. Takeda, T. Toda. Sequence-to-sequence network training methods for automatic guitar transcription with tokenized outputs. Proc. ISMIR, pp. 524-531, Milan, Italy, Nov. 2023. [Paper]
W.-C. Huang, T. Toda. Evaluating methods for ground-truth-free foreign accent conversion. Proc. APSIPA ASC, pp. 1136-1141, Taipei, Taiwan, Nov. 2023. [Paper]
L.P. Violeta, T. Toda. An analysis of personalized speech recognition system development for the deaf and hard-of-hearing. Proc. APSIPA ASC, pp. 1851-1856, Taipei, Taiwan, Nov. 2023. [Paper]
J. Tian, D. Hu, X. Shi, J. He, X. Li, Y. Gao, T. Toda, X. Xu, X. Hu. Semi-supervised multimodal emotion recognition with consensus decision-making and label correction. Proc. 1st International Workshop on Multimodal and Responsible Affective Computing (MRAC), pp. 67-73, Ottawa, Canada, Oct. 2023. [Paper]
A. Miyashita, T. Toda. Differentiable representation of warping based on Lie group theory. Proc. IEEE WASPAA, 5 pages, New Paltz, USA, Oct. 2023. [Paper]【IEEE WASPAA 2023 Best Paper Award (受賞者:Atsushi Miyashita)】
R. Wang, T. Toda. Directional target speaker extraction under noisy underdetermined conditions through conditional variational autoencoder with global style tokens. Proc. IEEE WASPAA, 5 pages, New Paltz, USA, Oct. 2023. [Paper]
S. Luan, Y. Wakabayashi, T. Toda. Sound field interpolation with unsupervised calibration for freely spaced circular microphone array in rotation-robust beamforming Proc. EUSIPCO, pp. 21-25, Sep. 2023. [Paper]
C.H. Hu, Y. Yasuda, T. Toda. Preference-based training framework for automatic speech quality assessment using deep neural network. Proc. INTERSPEECH, pp. 546-550, Dublin, Ireland, Aug. 2023. [Paper]
X. Shi, X. Li, T. Toda. Emotion awareness in multi-utterance turn for improving emotion prediction in multi-speaker conversation. Proc. INTERSPEECH, pp. 765-769, Dublin, Ireland, Aug. 2023. [Paper]
T. Okamoto, H. Yamashita, T. Toda, H. Kawai. E2E-S2S-VC: end-to-end sequence-to-sequence voice conversion. Proc. INTERSPEECH, pp. 2043-2047, Dublin, Ireland, Aug. 2023. [Paper]
Y. Choi, C. Xie, T. Toda. Reverberation-controllable voice conversion using reverberation time estimator. Proc. INTERSPEECH, pp. 2103-2107, Dublin, Ireland, Aug. 2023. [Paper]
Y. Yasuda, T. Toda. Analysis of mean opinion scores in subjective evaluation of synthetic speech based on tail probabilities. Proc. INTERSPEECH, pp. 5491-5495, Dublin, Ireland, Aug. 2023. [Paper]
Y. Yasuda, T. Toda. Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
K. Kobayashi, T. Hayashi, T. Toda. Low-latency electrolaryngeal speech enhancement based on FastSpeech2-based voice conversion and self-supervised speech representation. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
R. Yamamoto, R. Yoneyama, T. Toda. NNSVS: a neural network based singing voice synthesis toolkit. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
R. Yoneyama, Y.-C. Wu, T. Toda. Source-Filter HiFiGAN: fast and pitch controllable high-fidelity neural vocoder. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
L.P. Violeta, D. Ma, W.-C. Huang, T. Toda. Intermediate fine-tuning using imperfect synthetic speech for improving electrolaryngeal speech recognition. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
T. Fujimura, T. Toda. Analysis of Noisy-target Training for DNN-based speech enhancement. Proc. IEEE ICASSP, 5 pages,June 2023. [Paper]
A. Miyashita, T. Toda. Representation of vocal tract length transformation based on group theory. Proc. IEEE ICASSP, 5 pages, June 2023. [Paper]
D. Ma, L.P. Violeta, K. Kobayashi, T. Toda. Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion. Proc. IEEE SLT, pp. 949-954, Doha, Qatar, Jan. 2023. [Paper]
Y. Hashizume, L. Li, T. Toda. Music similarity calculation of individual instrumental sounds using metric learning. Proc. APSIPA ASC, pp. 33-38, Chiang Mai, Thailand, Nov. 2022. [Paper]
J. Feng, T. Yoshikawa, T. Toda. Interpretable control for emotional text-to-speech system toward development of sympathetic educational-support robots. Proc. APSIPA ASC, pp. 342-346, Chiang Mai, Thailand, Nov. 2022. [Paper]
R. Wang, L. Li, T. Toda. Direction-aware target speaker extraction with a dual-channel system based on conditional variational autoencoders under underdetermined conditions. Proc. APSIPA ASC, pp. 347-353, Chiang Mai, Thailand, Nov. 2022. [Paper]
S. Chen, T. Toda. Sequence-wise optimization for quasi-harmonic speech waveform modeling. Proc. APSIPA ASC, pp. 1658-1663, Chiang Mai, Thailand, Nov. 2022. [Paper]
I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Improvement of anomalous sound detection method considering the distribution of embedding. Proc. ICA, ABS-0189, 5 pages, Gyeongju, Korea, Oct. 2022 (Invited in structured session "A13-02: Anomalous sound detection and classification for condition monitoring").
C. Xie, T. Toda. Noisy-to-noisy voice conversion with pre-training strategy. Proc. ICA, ABS-0801, 5 pages, Gyeongju, Korea, Oct. 2022 (Invited in structured session "A15-06: Voice conversion").
L.P. Violeta, W.-C. Huang, T. Toda. Investigating self-supervised pretraining frameworks for pathological speech recognition. Proc. INTERSPEECH, pp. 41-45, Incheon, Korea, Sep. 2022. [Paper]
R. Yoneyama, Y.-C. Wu, T. Toda. Unified source-filter GAN with harmonic-plus-noise source excitation generation. Proc. INTERSPEECH, pp. 848-852, Incheon, Korea, Sep. 2022. [Paper]
W.-C. Huang, E. Cooper, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. The VoiceMOS Challenge 2022. Proc. INTERSPEECH, pp. 4536-4540, Incheon, Korea, Sep. 2022. [Paper]
D. Yoshioka, Y. Yaduda, N. Matsunaga, Y. Ohtani, T. Toda. Spoken-text-style transfer with conditional variational autoencoder and content word storage. Proc. INTERSPEECH, pp. 4576-4580, Incheon, Korea, Sep. 2022. [Paper]
Y. Choi, C. Xie, T. Toda. An evaluation of three-stage voice conversion framework for noisy and reverberant conditions. Proc. INTERSPEECH, pp. 4910-4914, Incheon, Korea, Sep. 2022. [Paper]
S. Kim, T. Hayashi, T. Toda. Note-level automatic guitar transcription using attention mechanism. Proc. EUSIPCO, pp. 229-233, Belgrade, Serbia, Aug.-Sep. 2022. [Paper]
I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Improvement of serial approach to anomalous sound detection by incorporating two binary cross-entropies for outlier exposure. Proc. EUSIPCO, pp. 294-298, Belgrade, Serbia, Aug.-Sep. 2022. [Paper]
S. Luan, Y. Wakabayashi, T. Toda. Modified sound field interpolation method for rotation-robust beamforming with unequally spaced circular microphone array. Proc. EUSIPCO, pp. 344-348, Aug.-Sep. Belgrade, Serbia, 2022. [Paper]
W.-C. Huang, E. Cooper, J. Yamagishi, T. Toda. LDNet: unified listener dependent modeling in MOS prediction for synthetic speech. Proc. IEEE ICASSP, pp. 896-900, May 2022. [Paper]
W.-C. Huang, S.-W. Yang, T. Hayashi, H.-Y. Lee, S. Watanabe, T. Toda. S3PRL-VC: open-source voice conversion framework with self-supervised speech representations. Proc. IEEE ICASSP, pp. 6552-6556, May 2022. [Paper]
W.-C. Huang, B.M Halpern, L.P. Violeta, O. Scharenborg, T. Toda. Towards identity preserving normal to dysarthric voice conversion. Proc. IEEE ICASSP, pp. 6672-6676, May 2022. [Paper]
C. Xie, Y-.C. Wu, P.L. Tobing, W-.C. Huang, T. Toda. Direct noisy speech modeling for noisy-to-noisy voice conversion. Proc. IEEE ICASSP, pp. 6787-6791, May 2022. [Paper]
T. Hayashi, K. Kobayashi, T. Toda. An investigation of streaming non-autoregressive sequence-to-sequence voice conversion. Proc. IEEE ICASSP, pp. 6802-6806, May 2022. [Paper]
E. Cooper, W.-C. Huang, T. Toda, J. Yamagishi. Generalization ability of MOS prediction networks. Proc. IEEE ICASSP, pp. 8442-8446, May 2022. [Paper]
W.-C. Huang, S.-W. Yang, T. Hayashi, H.-Y. Lee, S. Watanabe, T. Toda. S3PRL-VC: open-source voice conversion framework with self-supervised speech representations. Proc. AAAI-22 Workshop, W35: Self-Supervised Learning for Audio and Speech Processing, 5 pages, Feb. 2022. [Paper]
Z. Qian, H. Niu, L. Wang, K. Kobayashi, S. Zhang, T. Toda. Mandarin electro-laryngeal speech enhancement based on statistical voice conversion and manual tone control. Proc. APSIPA ASC, pp. 546-552, Dec. 2021. [Paper]
C. Xie, Y.-C. Wu, P.L. Tobing, W.-C. Huang, T. Toda. Noisy-to-noisy voice conversion framework with denoising model. Proc. APSIPA ASC, pp. 814-820, Dec. 2021. [Paper]
D. Ma, W.-C. Huang, T. Toda. Investigation of text-to-speech-based synthetic parallel data for sequence-to-sequence non-parallel voice conversion. Proc. APSIPA ASC, pp. 870-877, Dec. 2021. <APSIPA ASC 2021 The Best Paper Award> [Paper]
Y.-S. Liou, W.-C. Huang, M.-C. Yen, S.-W. Tsai, Y.-H. Peng, T. Toda, Y. Tsao, H.-M. Wang. Time alignment using lip images for frame-based electrolaryngeal voice conversion. Proc. APSIPA ASC, pp. 1234-1238, Dec. 2021. [Paper]
T. Okamoto, T. Toda, H. Kawai. Multi-stream HiFi-GAN with data-driven waveform decomposition. Proc. IEEE ASRU, pp. 610-617, Dec. 2021. [Paper]
W.-C. Huang, T. Hayashi, X. Li, S. Watanabe, T. Toda. On prosody modeling for ASR+TTS based voice conversion," . Proc. IEEE ASRU, pp. 642-649, Dec. 2021. [Paper]
M.-C. Yen, W.-C. Huang, K. Kobayashi, Y.-H. Peng, S.-W. Tasi, Y. Tsao, T. Toda, J.-S. R. Jang, H.-M. Wang. Mandarin electrolaryngeal speech voice conversion with sequence-to-sequence modeling. Proc. IEEE ASRU, pp. 650-657, Dec. 2021. [Paper]
H.-T. Chiang, Y.-C. Wu, C. Yu, T. Toda, H.-M. Wang, Y.-C. Hu, Y. Tsao. HASA-Net: a non-intrusive hearing-aid speech assessment network. Proc. IEEE ASRU, pp. 907-913, Dec. 2021. [Paper]
I. Kuroyanagi, T. Hayashi, Y. Adachi, T. Yoshimura, K. Takeda, T. Toda. An ensemble approach to anomalous sound detection based on conformer-based autoencoder and binary classifier incorporated with metric learning. Proc. DCASE 2021 Workshop, pp. 110-114, Nov. 2021. [Paper]
S. Seki, H. Taga, T. Toda. Singing fundamental frequency contour generation using generalized command response model and score-conditional variational autoencoder. Proc. IEEE MLSP, 6 pages, Oct. 2021. [Paper]
W.-C. Huang, K. Kobayashi, Y.-H. Peng, C.-F. Liu, Y. Tsao, H.-M. Wang, T. Toda. A preliminary study of a two-stage paradigm for preserving speaker identity in dysarthric voice conversion. Proc. INTERSPEECH, pp. 1329-1333, Aug.-Sep. 2021. [Paper]
R. Yoneyama, Y.-C. Wu, T. Toda. Unified source-filter GAN: unified source-filter network based on factorization of quasi-periodic parallel WaveGAN. Proc. INTERSPEECH, pp. 2187-2191, Aug.-Sep. 2021. [Paper]
P.L. Tobing, T. Toda. High-fidelity and low-latency universal neural vocoder based on multiband WaveRNN with data-driven linear prediction for discrete waveform modeling. Proc. INTERSPEECH, pp. 2217-2221, Aug.-Sep. 2021. [Paper]
Y.-C. Wu, C.-H. Hu, H.-S. Lee, Y.-H. Peng, W.-C. Huang, Y. Tsao, H.-M. Wang, T. Toda. Relational data selection for data augmentation of speaker-dependent multi-band MelGAN vocoder. Proc. INTERSPEECH, pp. 3630-3634, Aug.-Sep. 2021. [Paper]
P.L. Tobing, T. Toda. Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction. Proc. 11th ISCA Speech Synthesis Workshop (SSW11) , pp. 142-147, Aug. 2021. [Paper]
I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Anomalous sound detection using a binary classification model and class centroids. Proc. EUSIPCO, pp. 1995-1999, Aug. 2021. [Paper]
K. Kobayashi, W.-C. Huang, Y.-C. Wu, S. P.L. Tobing, T. Hayashi, T. Toda. Crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder. Proc. IEEE ICASSP, pp. 5934-5938, June 2021. [Paper]
W.-C. Huang, Y.-C. Wu, T. Hayashi, T. Toda. Any-to-one sequence-to-sequence voice conversion using self-supervised discrete speech representations. Proc. IEEE ICASSP, pp. 5944-5948, June 2021. [Paper]
T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Noise level limited sub-modeling for diffusion probabilistic vocoders. Proc. IEEE ICASSP, pp. 6029-6033, June 2021. [Paper]
A. Ando, R. Masumura, H. Sato, T. Moriya, T. Ashihara, Y. Ijima, T. Toda. Speech emotion recognition based on listener adaptive models. Proc. IEEE ICASSP, pp. 6274-6278, June 2021. [Paper]
K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, Y. Shiga, H. Kawai. High-intelligibility speech synthesis for dysarthric speakers with LPCNet-based TTS and CycleVAE-based VC. Proc. IEEE ICASSP, pp. 7058-7062, June 2021. [Paper]
T. Hayashi, W.-C. Huang, K. Kobayashi, T. Toda. Non-autoregressive sequence-to-sequence voice conversion. Proc. IEEE ICASSP, pp. 7068-6072, June 2021. [Paper]
W.-C. Huang, C.-H. Wu, S.-B. Luo, K.-Y. Chen, H.-M. Wang, T. Toda. Speech recognition by simply fine-tuning BERT. Proc. IEEE ICASSP, pp. 7343-7347, June 2021. [Paper]
H. Nakatani, P.L. Tobing, K. Takeda, T. Toda. Cross-lingual voice conversion using cyclic variational auto-encoder and a WaveNet vocoder. Proc. APSIPA ASC, pp. 520-526, Dec. 2020. [Paper]
M. Eshghi, K. Kobayashi, K. Tanaka, H. Kameoka, T. Toda. Phoneme embeddings on predicting fundamental frequency pattern for electrolaryngeal speech. Proc. APSIPA ASC, pp. 572-577, Dec. 2020. [Paper]
K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Conformer-based sound event detection with semi-supervised learning and data augmentation. Proc. DCASE 2020 Workshop, pp. 100-104, Nov. 2020. [Paper]
Z. Yi, W.-C. Huang, X. Tian, J. Yamagishi, R.K. Das, T. Kinnunen, Z. Ling, T. Toda. Voice Conversion Challenge 2020 –- intra-lingual semi-parallel and cross-lingual voice conversion –-. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 80-98, Oct. 2020. [Paper]
R.K. Das, T. Kinnunen, W.-C. Huang, Z. Ling, J. Yamagishi, Z. Yi, X. Tian, T. Toda. Predictions of subjective ratings and spoofing assessments of Voice Conversion Challenge 2020 submissions. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 99-120, Oct. 2020. [Paper]
P.L. Tobing, Y.-C. Wu, T. Toda. Baseline system of Voice Conversion Challenge 2020 with cyclic variational autoencoder and parallel WaveGAN. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 155-159, Oct. 2020. [Paper]
W.-C. Huang, T. Hayashi, S. Watanabe, T. Toda. The sequence-to-sequence baseline for the Voice Conversion Challenge 2020: cascading ASR and TTS. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 160-164, Oct. 2020. [Paper]
W.-C. Huang, P.L. Tobing, Y.-C. Wu, K. Kobayashi, T. Toda. The NU voice conversion system for the Voice Conversion Challenge 2020: on the effectiveness of sequence-to-sequence models and autoregressive neural vocoders. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 165-169, Oct. 2020. [Paper]
Y.-C. Wu, T. Hayashi, T. Okamoto, H. Kawai, T. Toda. Quasi-periodic parallel WaveGAN vocoder: a non-autoregressive pitch-dependent dilated convolution model for parametric speech generation. Proc. INTERSPEECH, pp. 3535-3539, Oct. 2020. [Paper]
Y.-C. Wu, P.L. Tobing, K. Yasuhara, N. Matsunaga, Y. Ohtani, T. Toda. A cyclical post-filtering approach to mismatch refinement of neural vocoder for text-to-speech systems. Proc. INTERSPEECH, pp. 3540-3544, Oct. 2020. [Paper]
S. Seki, M. Takada, T. Toda. Semi-supervised self-produced speech enhancement and suppression based on joint source modeling of air- and body-conducted signals using variational autoencoder. Proc. INTERSPEECH, pp. 4039-4043, Oct. 2020. [Paper]
S. Hikosaka, S. Seki, T. Hayashi, K. Kobayashi, K. Takeda, H. Banno, T. Toda. Intelligibility enhancement based on speech waveform modification using hearing impairment simulator. Proc. INTERSPEECH, pp. 4059-4063, Oct. 2020. [Paper]
W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, T. Toda. Voice transformer network: sequence-to-sequence voice conversion using transformer with text-to-speech pretraining. Proc. INTERSPEECH, pp. 4676-4680, Oct. 2020. [Paper]
P.L. Tobing, T. Hayashi, Y.-C. Wu, K. Kobayashi, T. Toda. Cyclic spectral modeling for unsupervised unit discovery into voice conversion with excitation and waveform modeling. Proc. INTERSPEECH, pp. 4861-4865, Oct. 2020. [Paper]
K. Kobayashi, T. Toda. Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN. Proc. EUSIPCO, pp. 396-400, Aug. 2020. [Paper]
M. Takada, S. Seki, P.L. Tobing, T. Toda. Semi-supervised enhancement and suppression of self-produced speech using correspondence between air- and body-conducted signals. Proc. EUSIPCO, pp. 456-460, Aug. 2020. [Paper]
K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Weakly-supervised sound event detection with self-attention. Proc. IEEE ICASSP, pp. 66-70, Full virtual, May 2020. [Paper]
T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Transformer-based text-to-speech with weighted forced attention. Proc. IEEE ICASSP, pp. 6729-6733, Full virtual, May 2020. [Paper]
P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda. Efficient shallow WaveNet vocoder using multiple samples output based on Laplacian distribution and linear prediction. Proc. IEEE ICASSP, pp. 7204-7208, Full virtual, May 2020. [Paper]
T. Hayashi, R. Yamamoto, K. Inoue, T. Yoshimura, S. Watanabe, T. Toda, K. Takeda, Y. Zhang, X. Tan. ESPNET-TTS: Uunified, reproducible, and integratable open source end-to-end text-to-speech toolkit. Proc. IEEE ICASSP, pp. 7654-7658, Full virtual, May 2020. [Paper]
P.L. Tobing, T. Hayashi, T. Toda. Investigation of shallow WaveNet vocoder with Laplacian distribution output. Proc. IEEE ASRU, pp. 176-183, Sentosa, Singapore, Dec. 2019. [Paper]
T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Tacotron-based acoustic model using phoneme alignment for practical neural text-to-speech synthesis. Proc. IEEE ASRU, pp. 214-221, Sentosa, Singapore, Dec. 2019. [Paper]
L. Li, T. Toda, K. Morikawa, K. Kobayashi, S. Makino. Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE. Proc. ISMIR, pp. 784-790, Delft, the Netherlands, Nov. 2019. [Paper]
F. Ahmadi, K. Kobayashi, T. Toda. Development of a real-time bionic voice generation system based on statistical excitation prediction. Proc. ACM ASSETS, pp. 655-657, Posters and Demos, Pittsburgh, USA, Oct. 2019. [Paper]
W.-C. Huang, Y.-C. Wu, K. Kobayashi, Y.-H. Peng, H.-T. Hwang, P.L. Tobing, Y. Tsao, H.-M. Wang, T. Toda. Generalization of spectrum differential based direct waveform modification for voice conversion. Proc. 10th ISCA Speech Synthesis Workshop (SSW10), pp. 57-62, Vienna, Austria, Sep. 2019. [Paper]
Y.-C. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda. Statistical voice conversion with quasi-periodic WaveNet vocoder. Proc. 10th ISCA Speech Synthesis Workshop (SSW10), pp. 63-68, Vienna, Austria, Sep. 2019. [Paper]
M. Eshghi, K. Tanaka, K. Kobayashi, H. Kameoka, T. Toda. An investigation of features for fundamental frequency pattern prediction in electrolaryngeal speech enhancement. Proc. 10th ISCA Speech Synthesis Workshop (SSW10), pp. 251-256, Vienna, Austria, Sep. 2019. [Paper]
Y.-C. Wu, T. Hayashi, P.L. Tobing, K. Kobayashi, T. Toda. Quasi-periodic WaveNet vocoder: a pitch dependent dilated convolution model for parametric speech generation. Proc. INTERSPEECH, pp. 196-200, Graz, Austria, Sep. 2019. [Paper]
P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda. Non-parallel voice conversion with cyclic variational autoencoder. Proc. INTERSPEECH, pp. 674-678, Graz, Austria, Sep. 2019. [Paper]
Y. Kurita, K. Kobayashi, K. Takeda, T. Toda. Robustness of statistical voice conversion based on direct waveform modification against background sounds. Proc. INTERSPEECH, pp. 684-688, Graz, Austria, Sep. 2019. [Paper]
W.-C. Huang, Y.-C. Wu, C.-C. Lo, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda, Y. Tsao, H.-M. Wang. Investigation of F0 conditioning and fully convolutional networks in variational autoencoder based voice conversion. Proc. INTERSPEECH, pp. 709-713, Graz, Austria, Sep. 2019. [Paper]
T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Real-time neural text-to-speech with sequence-to-sequence acoustic model and WaveGlow or single Gaussian WaveRNN vocoders. Proc. INTERSPEECH, pp. 1308-1312, Graz, Austria, Sep. 2019. [Paper]
T. Hayashi, S. Watanabe, T. Toda, K. Takeda, S. Toshniwal, K. Livescu. Pre-trained text embeddings for enhanced text-to-speech synthesis. Proc. INTERSPEECH, pp. 4430-4434, Graz, Austria, Sep. 2019. [Paper]
S. Seki, H. Kameoka, L. Li, T. Toda, K. Takeda. Generalized multichannel variational autoencoder for underdetermined source separation. Proc. EUSIPCO, 5 pages, A Coruna, Spain, Sep. 2019. [Paper]
W.-C. Huang, Y.-C. Wu, H.-T. Hwang, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda, Y. Tsao, H.-M. Wang. Refined WaveNet vocoder for variational autoencoder based voice conversion. Proc. EUSIPCO, 5 pages, A Coruna, Spain, Sep. 2019. [Paper]
T. Komatsu, T. Hayashi, R. Kondo, T. Toda, K. Takeda. Scene-dependent anomalous acoustic-event detection based on conditional WaveNet and i-Vector. Proc. IEEE ICASSP, pp. 870-874, Brighton, UK, May 2019. [Paper]
P.L. Tobing, Y. Wu, T. Hayashi, K. Kobayashi, T. Toda. Voice conversion with cyclic recurrent neural network and fine-tuned WaveNet vocoder. Proc. IEEE ICASSP, pp. 6815-6819, Brighton, UK, May 2019. [Paper]
T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features. Proc. IEEE ICASSP, pp. 7020-7024, Brighton, UK, May 2019. [Paper]
P.L. Tobing, T. Hayashi, Y. Wu, K. Kobayashi, T. Toda. An evaluation of deep spectral mappings and WaveNet vocoder for voice conversion. Proc. IEEE SLT, pp. 297-303, Athens, Greece, Dec. 2018. [Paper]
T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Improving FFTNet vocoder with noise shaping and subband approaches. Proc. IEEE SLT, pp. 304-311, Athens, Greece, Dec. 2018. [Paper]
T. Hayashi, S. Watanabe, Y. Zhang, T. Toda, T. Hori, R. Astudillo, K. Takeda. Back-translation-style data augmentation for end-to-end ASR. Proc. IEEE SLT, pp. 426-433, Athens, Greece, Dec. 2018. [Paper]
M. Takada, S. Seki, T. Toda. Self-produced speech enhancement and suppression method using air- and body-conductive microphones. Proc. APSIPA ASC, pp. 1240-1245, Hawaii, USA, Nov. 2018. [Paper]
K. Miyazaki, T. Hayashi, T. Toda, K. Takeda. Connectionist temporal classification-based sound event encoder for converting sound events into onomatopoeia representations. Proc. EUSIPCO, pp. 857-861, Rome, Italy, Sep. 2018. [Paper]
K. Kobayashi, T. Toda. Electrolarygeal speech enhancement with statistical voice conversion based on CLDNN. Proc. EUSIPCO, pp. 2129-2133, Rome, Italy, Sep. 2018. [Paper]
T. Hayashi, T. Komatsu, R. Kondo, T. Toda, K. Takeda. Anomalous sound event detection based on WaveNet. Proc. EUSIPCO, pp. 2508-2512, Rome, Italy, Sep. 2018. [Paper]
T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Multi-Head Decoder for end-to-end speech recognition. Proc. INTERSPEECH, pp. 801-805, Hyderabad, India, Sep. 2018. [Paper]
Y. Wu, K. Kobayashi, T. Hayashi, P.L. Tobing, T. Toda. Collapsed segment detection and reduction for WaveNet vocoder. Proc. INTERSPEECH, pp. 1998-1992, Hyderabad, India, Sep. 2018. [Paper]
H. Kawahara, K. Sakakibara, M. Morise, H. Banno, T. Toda, T. Irino. Frequency domain variants of velvet noise and their application to speech processing and synthesis. Proc. INTERSPEECH, pp. 2027-2031, Hyderabad, India, Sep. 2018. [Paper]
S. Tamura, K. Horio, H. Endo, S. Hayamizu, T. Toda. Audio-visual voice conversion using deep canonical correlation analysis for deep bottleneck features. Proc. INTERSPEECH, pp. 2469-2473, Hyderabad, India, Sep. 2018. [Paper]
F. Ahmadi, T. Toda. Designing a pneumatic bionic voice prosthesis - statistical approach for source excitation generation. Proc. INTERSPEECH, pp. 3142-3146, Hyderabad, India, Sep. 2018. [Paper]
T. Kinnunen, J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, Z. Ling. A spoofing benchmark for the 2018 voice conversion challenge: leveraging from spoofing countermeasures for speech artifact assessment. Proc. Odyssey 2018, pp. 187-194, Les Sables d'Olonne, France, June 2018. [Paper]
J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, T. Kinnunen, Z. Ling. The voice conversion challenge 2018: promoting development of parallel and nonparallel methods. Proc. Odyssey 2018, pp. 195-202, Les Sables d'Olonne, France, June 2018. [Paper]
K. Kobayashi, T. Toda. sprocket: open-source voice conversion software. Proc. Odyssey 2018, pp. 203-210, Les Sables d'Olonne, France, June 2018. [Paper]
Y. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda. The NU non-parallel voice conversion system for the voice conversion challenge 2018. Proc. Odyssey 2018, pp. 211-218, Les Sables d'Olonne, France, June 2018. [Paper]
P.L. Tobing, Y. Wu, T. Hayashi, K. Kobayashi, T. Toda. NU voice conversion system for the voice conversion challenge 2018. Proc. Odyssey 2018, pp. 219-226, Les Sables d'Olonne, France, June 2018. [Paper]
S. Seiya, R. Ito, K. Okamoto, U. Tanikawa, S. Ohira, D. Deguchi, T. Toda. Development of "KamiRepo" system with automatic student identification to handle handwritten assignments on LMS. Proc. EDUCON, pp. 841-848, Canary Islands, Spain, Apr. 2018. [Paper]
T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, H. Kawai. An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features. Proc. IEEE ICASSP, pp. 5654-5658, Calgary, Canada, Apr. 2018. [Paper]
K. Tachibana, T. Toda, Y. Shiga, H. Kawai. An investigation of noise shaping with perceptual weighting for WaveNet-based speech generation. Proc. IEEE ICASSP, pp. 5664-5668, Calgary, Canada, Apr. 2018. [Paper]
T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, H. Kawai. Subband WaveNet with overlapped single-sideband filterbanks. Proc. IEEE ASRU, pp. 698-704, Okinawa, Japan, Dec. 2017. [Paper]
T. Hayashi, A. Tamamori, K. Kobayashi, K. Takeda, T. Toda. An investigation of multi-speaker training for WaveNet vocoder. Proc. IEEE ASRU, pp. 712-718, Okinawa, Japan, Dec. 2017. [Paper]
K. Morikawa, T. Toda. Electrolaryngeal speech modification towards singing aid system for laryngectomees. Proc. APSIPA, 4 pages, Kuala Lumpur, Malaysia, Dec. 2017. [Paper]
P.L. Tobing, H. Kameoka, T. Toda. Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling. Proc. APSIPA, 4 pages, Kuala Lumpur, Malaysia, Dec. 2017. [Paper]
A. Tamamori, T. Hayashi, T. Toda, K. Takeda. Investigation of effectiveness on recurrent neural network for daily activity recognition using multi-modal signals. Proc. APSIPA, 7 pages, Kuala Lumpur, Malaysia, Dec. 2017 (Invited Talk in Special Session). [Paper]
K. Kubo, K. Kobayashi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An Investigation of how to design control parameters for statistical voice timbre control. Proc. APSIPA, 4 pages, Kuala Lumpur, Malaysia, Dec. 2017. [Paper]
H. Kawahara, K. Sakakibara, M. Morise, H. Banno, T. Toda. Accurate estimation of fo and aperiodicity based on periodicity detector residuals and deviations of phase derivatives. Proc. APSIPA, 9 pages, Kuala Lumpur, Malaysia, Dec. 2017. [Paper]
S. Seki, H. Kameoka, T. Toda, K. Takeda. Missing component restoration for masked speech signals based on time-domain spectrogram factorization. Proc. MLSP, 6 pages, Tokyo, Japan, Sep. 2017. [Paper]
S. Seki, T. Toda, K. Takeda. Stereophonic music separation based on non-negative tensor factorization with cepstrum regularization. Proc. EUSIPCO, pp. 1011-1015, Kos island, Greece, Aug. 2017. [Paper]
H. Kawahara, K. Sakakibara, M. Morise, H. Banno, T. Toda. A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and FO estimation. Proc. INTERSPEECH, pp. 424-428, Stockholm, Sweden, Aug. 2017. [Paper]
K. Tanaka, H. Kameoka, T. Toda, S. Nakamura. Physically constrained statistical F0 prediction for electrolaryngeal speech enhancement. Proc. INTERSPEECH, pp. 1069-1073, Stockholm, Sweden, Aug. 2017. [Paper]
A. Tamamori, T. Hayashi, K. Kobayashi, K. Takeda, T. Toda. Speaker-dependent WaveNet vocoder. Proc. INTERSPEECH, pp. 1118-1122, Stockholm, Sweden, Aug. 2017. [Paper]
K. Kobayashi, T. Hayashi, A. Tamamori, T. Toda. Statistical voice conversion with WaveNet-based waveform generation. Proc. INTERSPEECH, pp. 1138-1142, Stockholm, Sweden, Aug. 2017. [Paper]
H. Kawahara, K. Sakakibara, H. Banno, M. Morise, T. Toda, T. Irino. A new cosine series antialiasing function and its application to aliasing-free glottal source models for speech and singing synthesis. Proc. INTERSPEECH, pp. 1358-1362, Stockholm, Sweden, Aug. 2017. [Paper]
L. Li, H. Kameoka, T. Toda, S. Makino. Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization. Proc. INTERSPEECH, pp. 1998-2002, Stockholm, Sweden, Aug. 2017. [Paper]
T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda. BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic sound event detection. Proc. IEEE ICASSP, pp. 766-770, New Orleans, USA, Mar. 2017. [Paper]
Y. Tajiri, H. Kameoka, T. Toda. A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body-conducted signals. Proc. IEEE ICASSP, pp. 4960-4964, New Orleans, USA, Mar. 2017. [Paper]
K. Kobayashi, T. Toda, S. Nakamura. F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential. Proc. IEEE SLT, pp. 693-700, San Diego, USA, Dec. 2016. [Paper]
A. Tamamori, T. Hayashi, T. Toda, K. Takeda. Investigation on recurrent neural network architectures for daily activity recognition. Proc. UV2016, 4 pages, Aichi, Japan, Oct. 2016.
Y. Tajiri, T. Toda. Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring. Proc. 9th ISCA Speech Synthesis Workshop (SSW9), pp. 54-60, Sunnyvale, USA, Sep. 2016. [Paper]
P.L. Tobing, T. Toda, H. Kameoka, S. Nakamura. Acoustic-to-articulatory inversion mapping based on latent trajectory Gaussian mixture model. Proc. INTERSPEECH, pp. 953-957, San Francisco, USA, Sep. 2016. [Paper]
T. Toda, L.-H. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, J. Yamagishi. The Voice Conversion Challenge 2016. Proc. INTERSPEECH, pp. 1632-1636, San Francisco, USA, Sep. 2016. [Paper]
K. Kobayashi, S. Takamichi, S. Nakamura, T. Toda. The NU-NAIST voice conversion system for the Voice Conversion Challenge 2016. Proc. INTERSPEECH, pp. 1667-1671, San Francisco, USA, Sep. 2016. <2017 Outstanding Paper Award for Young C&C Researchers (recipient: Kazuhiro Kobayashi)> [Paper]
K. Tachibana, T. Toda, Y. Shiga, H. Kawai. Model integration for HMM- and DNN-based speech synthesis using Product-of-Experts framework. Proc. INTERSPEECH, pp. 2288-2292, San Francisco, USA, Sep. 2016. [Paper]
Q. Truong Do, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A hybrid system for continuous word-level emphasis modeling based on HMM state clustering and adaptive training. Proc. INTERSPEECH, pp. 3196-3200, San Francisco, USA, Sep. 2016. [Paper]
T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda. Bidirectional LSTM-HMM hybrid system for polyphonic sound event detection. Proc. DCASE2016 workshop, 5 pages, Budapest, Hungary, Sep. 2016. [Paper]
K. Tanaka, T. Toda, G. Neubig, S. Nakamura. Real-time vibration control of an electrolarynx based on statistical F0 contour prediction. Proc. EUSIPCO, pp. 1333-1337, Budapest, Hungary, Aug. 2016. [Paper]
H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Removing noise from event-related potentials using a probabilistic generative model with grouped covariance matrices. Proc. EMBC, 4 pages, Orlando, USA, Aug. 2016. [Paper]
S. Yamane, K. Kobayashi, T. Toda, T. Nakano, M. Goto, S. Nakamura. An estimation method of voice timbre evaluation values using feature extraction with Gaussian mixture model based on reference singer. Proc. IEEE ICASSP, pp. 5265-5269, Shanghai, China, Mar. 2016. [Paper]
K. Tanaka, H. Kameoka, T. Toda, S. Nakamura. Statistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework. Proc. IEEE ICASSP, pp. 5665-5669, Shanghai, China, Mar. 2016. [Paper]
K. Kobayashi, T. Toda, S. Nakamura. Implementation of F0 transformation for statistical singing voice conversion based on direct waveform modification. Proc. IEEE ICASSP, pp. 5670-5674, Shanghai, China, Mar. 2016. [Paper]
Y. Tajiri, T. Toda, S. Nakamura. Noise suppression method for body-conducted soft speech enhancement based on external noise monitoring. Proc. IEEE ICASSP, pp. 5935-5939, Shanghai, China, Mar. 2016. [Paper]
T. Hiraoka, G. Neubig, K. Yoshino, T. Toda, S. Nakamura. Active learning for example-based dialog systems. Proc. IWSDS, 11 pages, Saariselka, Finland, Jan. 2016. [Paper]
Y. Tsunomori, G. Neubig, T. Hiraoka, M. Mizukami, S. Sakti, T. Toda, S. Nakamura. A dialog system to detect deception. Proc. IWSDS, 6 pages, Saariselka, Finland, Jan. 2016. [Paper]
S. Sakti, F. Ilham, G. Neubig, T. Toda, Purwarianti, S. Nakamura. Incremental sentence compression using LSTM recurrent networks. Proc. IEEE ASRU, pp. 252-258, Scottsdale, USA, Dec. 2015. [Paper]
Q. Truong Do, M. Heck, S. Sakti, G. Neubig, T. Toda, S. Nakamura. The NAIST ASR system for the 2015 Multi-Genre Broadcast Challenge: on combination of deep learning systems using a rank-score function. Proc. IEEE ASRU, pp. 654-659, Scottsdale, USA, Dec. 2015. [Paper]
N. Lubis, S. Sakti, G. Neubig, K. Yoshino, T. Toda, S. Nakamura. A study of social-affective communication: automatic prediction of emotion triggers and responses in television talk shows. Proc. IEEE ASRU, pp. 777-783, Scottsdale, USA, Dec. 2015. [Paper]
M. Mizukami, H. Kizuki, T. Nomura, G. Neubig, K. Yoshino, S. Sakti, T. Toda, S. Nakamura. Adaptive selection from multiple response candidates in example-based dialogue. Proc. IEEE ASRU, pp. 784-790, Scottsdale, USA, Dec. 2015. [Paper]
H. Kawahara, K. Sakakibara, H. Banno, M. Morise, T. Toda, T. Irino. Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation. Proc. APSIPA ASC, pp. 520-529, Hong Kong, China, Dec. 2015. [Paper]
Q. Truong Do, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Improving translation of emphasis with pause prediction in speech-to-speech translation systems. Proc. IWSLT, pp. 204-208, Da Nang, Vietnam, Dec. 2015. [Paper]
Y. Oda, H. Fudaba, G. Neubig, H. Hata, S. Sakti, T. Toda, S. Nakamura. Learning to generate pseudo-code from source code using statistical machine translation. Proc. ASE, pp. 574-584, Lincoln, USA, Nov. 2015. [Paper]
H. Fudaba, Y. Oda, K. Akabe, G. Neubig, H. Hata, S. Sakti, T. Toda, S. Nakamura. Pseudogen: a tool to automatically generate pseudo-code from source code. Proc. ASE, Tool Demos, pp. 824-829, Lincoln, USA, Nov. 2015. [Paper]
N. Lubis, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Construction and analysis of social-affective interaction corpus in English and Indonesian. Proc. O-COCOSDA, pp. 202-206, Shanghai, China, Oct. 2015. [Paper]
K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An enhanced electrolarynx with automatic fundamental frequency control based on statistical prediction. Proc. ACM ASSETS, Demonstration paper, pp. 435-436, Lisbon, Portugal, Oct. 2015. [Paper]
K. Sugiyama, M. Mizukami, G. Neubig, K. Yoshino, S. Sakti, T. Toda, S. Nakamura. An investigation of machine translation evaluation metrics in cross-lingual question answering. Proc. 10th Workshop on Statistical Machine Translation (WMT), pp. 442-449, Lisbon, Portugal, Sep. 2015. [Paper]
Y. Nishigaki, S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Prosody-controllable HMM-based speech synthesis using speech input. Proc. MLSLP, 5 pages, Fukushima, Japan, Sep. 2015. [Paper]
S. Takamichi, K. Kobayashi, K. Tanaka, T. Toda, S. Nakamura. The NAIST text-to-speech system for the Blizzard Challenge 2015. Proc. Blizzard Challenge 2015 Workshop, 4 pages, Berlin, Germany, Sep. 2015. [Paper]
Y. Oshima, S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. Proc. INTERSPEECH, pp. 299-303, Dresden, Germany, Sep. 2015. [Paper]
S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Modulation spectrum-constrained trajectory training algorithm for HMM-based speech synthesis. Proc. INTERSPEECH, pp. 1206-1210, Dresden, Germany, Sep. 2015. [Paper]
T. Mieno, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Speed or accuracy? a study in evaluation of simultaneous speech translation. Proc. INTERSPEECH, pp. 2267-2271, Dresden, Germany, Sep. 2015. [Paper]
T.T. Nguyen, G. Neubig, H. Shindo, S. Sakti, T. Toda, S. Nakamura. A latent variable model for joint pause prediction and dependency parsing. Proc. INTERSPEECH, pp. 2719-2723, Dresden, Germany, Sep. 2015. [Paper]
K. Kobayashi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Statistical singing voice conversion based on direct waveform modification with global variance. Proc. INTERSPEECH, pp. 2754-2758, Dresden, Germany, Sep. 2015. [Paper]
Y. Tajiri, K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments. Proc. INTERSPEECH, pp. 2769-2773, Dresden, Germany, Sep. 2015. [Paper]
P.L. Tobing, K. Kobayashi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential. Proc. INTERSPEECH, pp. 3350-3354, Dresden, Germany, Sep. 2015. [Paper]
D.Q. Truong, S. Takamichi, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs. Proc. INTERSPEECH, pp. 3665-3669, Dresden, Germany, Sep. 2015. [Paper]
H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Evaluation of EEG ocular artifact removal with a multi-channel wiener filter based on probabilistic generative model. Proc. EMBC, 4 pages, Milan, Italy, Aug. 2015. [Paper]
Y. Oda, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Syntax-based simultaneous translation through prediction of unseen syntactic constituents. Proc. ACL, pp. 198-207, Beijing, China, July 2015. [Paper]
A. Miura, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Improving pivot translation by remembering the pivot. Proc. ACL, pp. 573-577, Beijing, China, July 2015. [Paper]
Y. Oda, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Ckylark: a more robust PCFG-LA parser. Proc. NAACL HLT, Demo Track, pp. 41-45, Denver, USA, June 2015. [Paper]
H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. EEG signal enhancement using multichannel Wiener filter with a spatial correlation prior. Proc. IEEE ICASSP, pp. 2639-2643, Brisbane, Australia, Apr. 2015. [Paper]
S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Parameter generation algorithm considering modulation spectrum for HMM-based speech synthesis. Proc. IEEE ICASSP, pp. 4210-4214, Brisbane, Australia, Apr. 2015. [Paper]
Z. Wu, A. Khodabakhsh, C. Demiroglu, J. Yamagishi, D. Saito, T. Toda, S. King. SAS: a speaker verification spoofing database containing diverse attacks. Proc. IEEE ICASSP, pp. 4440-4444, Brisbane, Australia, Apr. 2015. [Paper]
A. Tjandra, S. Sakti, G. Neubig, T. Toda, M. Adriani, S. Nakamura. Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR. Proc. IEEE ICASSP, pp. 4525-4529, Brisbane, Australia, Apr. 2015. [Paper]
S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Modulation spectrum-constrained trajectory training algorithm for GMM-based voice conversion. Proc. IEEE ICASSP, pp. 4859-4863, Brisbane, Australia, Apr. 2015. [Paper]
H. Tanaka, S. Sakti, G. Neubig, T. Toda, H. Negoro, H. Iwasaka, S. Nakamura. Automated social skills trainer. Proc. IUI, pp. 17-27, Atlanta, USA, Mar. 2015. [Paper]
M. Mizukami, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Linguistic individuality transformation for spoken language. Proc. IWSDS, 12 pages, Busan, South Korea, Jan. 2015. [Paper]
F. Koto, S. Sakti, G. Neubig, T. Toda, M. Adriani, S. Nakamura. A study on natural expressive speech: automatic memorable spoken quote detection. Proc. IWSDS, 6 pages, Busan, South Korea, Jan. 2015. [Paper]
T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Evaluation of a fully automatic cooperative persuasive dialogue system. Proc. IWSDS, 12 pages, Busan, South Korea, Jan. 2015. [Paper]
T. Sasakura, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Unknown word detection based on event-related brain desynchronization responses. Proc. IWSDS, 6 pages, Busan, South Korea, Jan. 2015. [Paper]
Y. Tsunomori, G. Neubig, S. Sakti, T. Toda, S. Nakamura. An analysis towards dialogue-based deception detection. Proc. IWSDS, 11 pages, Busan, South Korea, Jan. 2015. [Paper]
H. Kawahara, M. Morise, T. Toda, H. Banno, R. Nisimura, T. Irino. Excitation source design for high-quality speech manipulation systems based on a temporally static group delay representation of periodic signals. Proc. APSIPA ASC, 10 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
S. Sakti, Y. Odagaki, T. Sasakura, G. Neubig, T. Toda, S. Nakamura. An event-related brain potential study on the impact of speech recognition errors. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
S. Tsuruta, K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An evaluation of target speech for a nonaudible murmur enhancement system in noisy environments. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
K. Kobayashi, T. Toda, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. Gender-dependent spectrum differential models for perceived age control based on direct waveform modification in singing voice conversion. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
L. Nio, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Recursive neural network paraphrase identification for example-based dialog retrieval. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An inter-speaker evaluation through simulation of electrolarynx control based on statistical F0 prediction. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
R. Yoshida, T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Unnecessary utterance detection for avoiding digressions in discussion. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
F. Koto, S. Sakti, G. Neubig, T. Toda, M. Adriani, S. Nakamura. The use of semantic and acoustic features for open-domain TED talk summarization. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. [Paper]
S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Modulation spectrum-based post-filter for GMM-based voice conversion. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014. <APSIPA ASC 2014 The Best Paper Award> [Paper]
L. Nio, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Improving the robustness of example-based dialog retrieval using recursive neural network paraphrase identification. Proc. IEEE SLT, pp. 306-311, South Lake Tahoe, USA, Dec. 2014. [Paper]
S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Modified post-filter to recover modulation spectrum for HMM-based speech synthesis. Proc. GlobalSIP, pp. 710-714, Atlanta, USA, Dec. 2014. [Paper]
T. Toda. Augmented speech production based on real-time statistical voice conversion. Proc. GlobalSIP, pp. 755-759, Atlanta, USA, Dec. 2014 (Invited Talk). [Paper]
Y. Hatakoshi, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Rule-based syntactic preprocessing for syntax-based machine translation. Proc. 8th Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), pp. 34-42, Doha, Qatar, Oct. 2014. [Paper]
K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Direct F0 control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation. Proc. INTERSPEECH, pp. 31-35, MAX Atria, Singapore, Sep. 2014. [Paper]
N. Jinbo, S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A hearing impairment simulation method using audiogram-based approximation of auditory characteristics. Proc. INTERSPEECH, pp. 490-494, MAX Atria, Singapore, Sep. 2014. [Paper]
K. Kubo, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Structured soft margin confidence weighted learning for grapheme-to-phoneme conversion. Proc. INTERSPEECH, pp. 1263-1267, MAX Atria, Singapore, Sep. 2014. [Paper]
S. Matsumiya, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus. Proc. INTERSPEECH, pp. 1801-1805, MAX Atria, Singapore, Sep. 2014. [Paper]
H. Kawahara, M. Morise, T. Toda, H. Banno, R. Nisimura, T. Irino. Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation. Proc. INTERSPEECH, pp. 2243-2247, MAX Atria, Singapore, Sep. 2014. [Paper]
P.L. Tobing, T. Toda, G. Neubig, S. Sakti, S. Nakamura, A. Purwarianti. Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models. Proc. INTERSPEECH, pp. 2298-2302, MAX Atria, Singapore, Sep. 2014. [Paper]
K. Kobayashi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Statistical singing voice conversion with direct waveform modification based on the spectrum differential. Proc. INTERSPEECH, pp. 2514-2518, MAX Atria, Singapore, Sep. 2014. [Paper]
D.Q. Truong, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Collection and analysis of a Japanese-English emphasized speech corpus. Proc. O-COCOSDA, pp. 77-82, Phuket, Thailand, Sep. 2014. [Paper]
M. Mizukami, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Building a free, general-domain paraphrase database for Japanese. Proc. O-COCOSDA, pp. 129-133, Phuket, Thailand, Sep. 2014. [Paper]
F. Koto, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Memorable spoken quote corpora of TED public speaking. Proc. O-COCOSDA, pp. 140-143, Phuket, Thailand, Sep. 2014. [Paper]
L. Nio, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Conversation dialog corpora from drama television and movie scripts. Proc. O-COCOSDA, pp. 144-148, Phuket, Thailand, Sep. 2014. [Paper]
H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Probabilistic enhancement of EEG component using prior information of component-related spatial correlation. Proc. EMBC, 1 page, Late-Breaking Research Poster, Chicago, USA, Aug. 2014. [Paper]
K. Akabe, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Discriminative language models as a tool for machine translation error analysis. Proc. COLING, pp. 1124-1132, Dublin, Ireland, Aug. 2014. [Paper]
T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Reinforcement learning of cooperative persuasive dialogue policies using framing. Proc. COLING, pp. 1706-1717, Dublin, Ireland, Aug. 2014. [Paper]
Y. Oda, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Optimizing segmentation strategies for simultaneous speech translation. Proc. ACL, pp. 551-556, Baltimore, USA, June 2014. [Paper]
H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Linguistic and acoustic features for automatic identification of autism spectrum disorders in children's narrative. Proc. Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality,, pp. 88-96, Baltimore, USA, June 2014. [Paper]
H. Shimizu, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Collection of a simultaneous translation corpus for comperative analysis. Proc. LREC, pp. 670-673, Reykjavik, Iceland, May 2014. [Paper]
S. Sakti, K. Kubo, S. Matsumiya, G. Neubig, T. Toda, S. Nakamura, F. Adachi, R. Isotani. Towards multilingual conversations in the medical domain: development of multilingual medical data and a network-based ASR system. Proc. LREC, pp. 2639-2643, Reykjavik, Iceland, May 2014. [Paper]
S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A postfilter to modify the modulation spectrum in HMM-based speech synthesis. Proc. IEEE ICASSP, pp. 290-294, Florence, Italy, May 2014. <IEEE Signal Processing Society Japan Outstanding Student Conference Paper Award (recipient: Shinnosuke Takamichi)> [Paper]
K. Kubo, S. Sakti, G. Neubig, T. Toda, S. Nakamura. NARROW adaptive regularization of weights for grapheme-to-phoneme conversion. Proc. IEEE ICASSP, pp. 2608-2612, Florence, Italy, May 2014. [Paper]
K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement. Proc. IEEE ICASSP, pp. 4521-4525, Florence, Italy, May 2014. [Paper]
K. Kobayashi, T. Toda, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. Regression approaches to perceptual age control in singing voice conversion. Proc. IEEE ICASSP, pp. 7954-7958, Florence, Italy, May 2014. [Paper]
H.T. Vu, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Acquiring a dictionary of emotion-provoking events. Proc. EACL, pp. 128-132, Gothenburg, Sweden, Apr. 2014. [Paper]
T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Construction and analysis of a persuasive dialogue corpus. Proc. IWSDS, pp. 213-223, Napa, USA, Jan. 2014. [Paper]
N. Lubis, S. Sakti, G. Neubig, T. Toda, A. Purwarianti, S. Nakamura. Emotion and its triggers in human spoken dialogue: recognition and analysis. Proc. IWSDS, pp. 224-229, Napa, USA, Jan. 2014. [Paper]
H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Modality and contextual differences in computer based non-verbal communication training. Proc. CogInfoCom, pp. 127-132, Budapest, Hungary, Dec. 2013. [Paper]
H. Shimizu, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Constructing a apeech translation system using simultaneous interpretation data. Proc. IWSLT, 7 pages, Heidelberg, Germany, Dec. 2013. [Paper]
S. Sakti, K. Kubo, G. Neubig, T. Toda S. Nakamura. The NAIST English speech recognition system for IWSLT 2013. Proc. IWSLT, 5 pages, Heidelberg, Germany, Dec. 2013. [Paper]
T. Hiraoka, Y. Yamauchi, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Dialogue management for leading the conversation in persuasive dialogue systems. Proc. IEEE ASRU, pp. 114-119, Olomouc, Czech Republic, Dec. 2013. [Paper]
H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Non-verbal communication training with an interactive multimedia application. Proc. ACE, Osaka, Japan, Oct. 2013. [Paper]
Lasguido, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Combination of example-based and SMT-based approaches in a chat-oriented dialog system. Proc. ICE-ID, 6 pages, Bali, Indonesia, Oct. 2013. [Paper]
G. Neubig, S. Sakti, T. Toda, S. Nakamura, Y. Matsumoto, R. Isotani, Y. Ikeda. Towards high-reliability speech translation in the medical domain. Proc. MedNLP-WS, 8 pages, Aichi, Japan, Oct. 2013. [Paper]
P. Arthur, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Inter-sentence features and thresholded minimum error rate training: NAIST at CLEF 2013 QA4MRE. Proc. CLEF, 11 pages, Valencia, Spain, Sep. 2013. [Paper]
T. Toda, H. Doi. Statistical voice conversion techniques for alaryngeal speech enhancement. Proc. SICE 2013, pp. 1602-1603, Aichi, Japan, Sep. 2013 (Invited Talk in Special Session). [Paper]
T. Inukai, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Investigation of intra-speaker spectral parameter variation and its prediction towards improvement of spectral conversion metric. Proc. 8th ISCA Speech Synthesis Workshop (SSW8), pp. 89-94, Barcelona, Spain, Aug. 2013. [Paper]
H. Kawahara, M. Morise, T. Toda, R. Nisimura, T. Irino. Beyond bandlimited sampling of speech spectral envelope imposed by the harmonic structure of voiced sounds. Proc. INTERSPEECH, pp. 34-38, Lyon, France, Aug. 2013. [Paper]
S. Takamichi, T. Toda, Y. Shiga, S. Sakti, G. Neubig, S. Nakamura. Improvements to HMM-based speech synthesis based on parameter generation with rich context models. Proc. INTERSPEECH, pp. 364-368, Lyon, France, Aug. 2013. [Paper]
K. Kobayashi, H. Doi, T. Toda, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. An investigation of acoustic features for singing voice conversion based on perceptual age. Proc. INTERSPEECH, pp. 1057-1061, Lyon, France, Aug. 2013. [Paper]
H. Doi, T. Toda, T. Nakano, M. Goto, S. Nakamura. Evaluation of a singing voice conversion method based on many-to-many eigenvoice conversion. Proc. INTERSPEECH, pp. 1067-1071, Lyon, France, Aug. 2013. [Paper]
K. Kubo, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Grapheme-to-phoneme conversion based on adaptive regularization of weight vectors. Proc. INTERSPEECH, pp. 1946-1950, Lyon, France, Aug. 2013. [Paper]
T. Kano, S. Takamichi, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Generalizing continuous-space translation of paralinguistic information. Proc. INTERSPEECH, pp. 2614-2618, Lyon, France, Aug. 2013. [Paper]
M. Ohgushi, G. Neubig, S. Sakti, T. Toda, S. Nakamura. An empirical comparison of joint optimization techniques for speech translation. Proc. INTERSPEECH, pp. 2619-2723, Lyon, France, Aug. 2013. [Paper]
K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Hybrid approach to electrolaryngeal speech enhancement based on spectral subtraction and statistical voice conversion. Proc. INTERSPEECH, pp. 3067-3071, Lyon, France, Aug. 2013. [Paper]
T. Moriguchi, T. Toda, M. Sano, H. Sato, G. Neubig, S. Sakti, S. Nakamura. A digital signal processor implementation of silent/electrolaryngeal speech enhancement based on real-time statistical voice conversion. Proc. INTERSPEECH, pp. 3072-3076, Lyon, France, Aug. 2013. [Paper]
T. Fujita, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Simple, lexicalized choice of translation timing for simultaneous speech translation. Proc. INTERSPEECH, pp. 3487-3491, Lyon, France, Aug. 2013. [Paper]
M. Itoi, R. Miyazaki, T. Toda, H. Saruwatari, K. Shikano. Blind speech extraction for non-audible murmur speech with speaker's movement noise. Proc. ISSPIT, 6 pages, Ho Chi Minh City, Vietnam, Dec. 2012. [Paper]
A. Sani, S. Sakti, G. Neubig, T. Toda, A. Mulyanto, S. Nakamura. Towards language preservation: preliminary collection and vowel analysis of Indonesian ethnic speech data. Proc. Oriental COCOSDA, pp. 118-122, Macau, China, Dec. 2012. <Best Student Paper Award (recipient: Auliya Sani)> [Paper]
G. Neubig, K. Duh, M. Ogushi, T. Kano, T. Kiso, S. Sakti, T. Toda, S. Nakamura. The NAIST machine translation system for IWSLT 2012. Proc. IWSLT, pp. 54-60, Hong Kong, China, Dec. 2012. [Paper]
C. Saam, C. Mohr, K. Kilgour, M. Heck, M. Sperber, K. Kubo, S. Stueker, S. Sakti, G. Neubig, T. Toda, S. Nakamura, A. Waibel. The 2012 KIT and KIT-NAIST English ASR systems for the IWSLT evaluation. Proc. IWSLT, pp. 87-90, Hong Kong, China, Dec. 2012. [Paper]
M. Heck, K. Kubo, M. Sperber, S. Sakti, S. Stueker, C. Saam, K. Kilgour, C. Mohr, G. Neubig, T. Toda, S. Nakamura, A. Waibel. The KIT-NAIST (contrastive) English ASR system for IWSLT 2012. Proc. IWSLT, pp. 91-95, Hong Kong, China, Dec. 2012. [Paper]
T. Kano, S. Sakti, S. Takamichi, G. Neubig, T. Toda, S. Nakamura. A method for translation of paralinguistic information. Proc. IWSLT, pp. 158-163, Hong Kong, China, Dec. 2012. [Paper]
H. Doi, T. Toda, T. Nakano, M. Goto, S. Nakamura. Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system. Proc. APSIPA ASC, 6 pages, Hollywood, USA, Nov. 2012. <APSIPA ASC 2012 The Best Paper Award (Short Paper in Regular Session Category)> [Paper]
H. Tanaka, S. Sakti, G. Neubig, T. Toda, N. Campbell, S. Nakamura. Non-verbal cognitive skills and autistic conditions: an analysis and training tool. Proc. CogInfoCom, pp. 41-46, Kosice, Slovakia, Dec. 2012. [Paper]
Lasguido, S. Sakti, G. Neubig, T. Toda, M. Adriani, S. Nakamura. Developing Non-Goal Dialog System based on Examples of Drama Television. Proc. IWSDS, pp. 315-320, Paris, France, Nov. 2012. [Paper]
M. Kishimoto, T. Toda, H. Doi, S. Sakti, S. Nakamura. Model training using parallel data with mismatched pause positions in statistical esophageal speech enhancement. Proc. ICSP, pp. 590-594, Beijing, China, Oct. 2012 (Invited Talk in Special Session). [Paper]
T. Toda, T. Muramatsu, H. Banno. Implementation of computationally efficient real-time voice conversion. Proc. INTERSPEECH, 4 pages, Portland, USA, Sep. 2012. [Paper]
S. Takamichi, T. Toda, Y. Shiga, H. Kawai, S. Sakti, S. Nakamura. An evaluation of parameter generation methods with rich context models in HMM-based speech synthesis. Proc. INTERSPEECH, 4 pages, Portland, USA, Sep. 2012. [Paper]
T. Toda. Statistical approaches to enhancement of body-conducted speech detected with non-audible murmur microphone. Proc. ICME CME, pp. 623-628, Hyogo, Japan, July 2012 (Invited Poster in Special Session). [Paper]
K. Yamamoto, T. Toda, H. Doi, H. Saruwatari, K. Shikano. Statistical approach to voice quality control in esophageal speech enhancement. Proc. IEEE ICASSP, pp. 4497-4500, Kyoto, Japan, Mar. 2012. [Paper]
S. Ishii, T. Toda, H. Saruwatari, S. Sakti, S. Nakamura. Blind noise suppression for non-audible murmur recognition with stereo signal processing. Proc. IEEE ASRU, pp. 494-499, Hawaii, USA, Dec. 2011. <Elected as Panel Member in "New Applications in Speech Processing" Session> [Paper]
D. Deguchi, T. Toda, H. Doi, H. Saruwatari, K. Shikano. Computationally efficient body-conducted voice conversion with original excitation signals. Proc. APSIPA ASC, 4 pages, Xi'an, China, Oct. 2011. [Paper]
N. Hattori, T. Toda, Hisashi Kawai, H. Saruwatari, K. Shikano. Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speech-to-speech translation. Proc. INTERSPEECH, pp. 2769-2772, Florence, Italy, Aug. 2011. [Paper]
H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques. Proc. IEEE ICASSP, pp. 5136-5139, Prague, Czech Republic, May. 2011. [Paper]
D. Babani, T. Toda, H. Saruwatari, K. Shikano. Acoustic model training for non-audible murmur recognition using transformed normal speech data. Proc. IEEE ICASSP, pp. 5224-5227, Prague, Czech Republic, May. 2011. [Paper]
H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Speaking-aid systems based on one-to-many eigenvoice conversion for total laryngectomees. Proc. APSIPA ASC, pp. 498-501, Biopolis, Singapore, Dec. 2010. [Paper]
D. Deguchi, H. Doi, T. Toda, H. Saruwatari, K. Shikano. Acoustic compensation method for accepting different recording devices in body-conducted voice conversion. Proc. APSIPA ASC, pp. 502-505, Biopolis, Singapore, Dec. 2010. [Paper]
D. Babani, T. Toda, H. Saruwatari, K. Shikano. An Evaluation of discriminative training for hidden Markov models in a real-environment speech-oriented guidance system. Proc. Student Symposium in APSIPA ASC, page 8, Biopolis, Singapore, Dec. 2010. [Paper]
Y. Shiga, T. Toda, S. Sakai, H. Kawai. Improved training of excitation for HMM-based parametric speech synthesis. Proc. INTERSPEECH, pp. 809-812, Chiba, Japan, Sep. 2010. [Paper]
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion. Proc. INTERSPEECH, pp. 1628-1631, Chiba, Japan, Sep. 2010. [Paper]
K. Ohta, T. Toda, Y. Ohtani, H. Saruwatari, K. Shikano. Adaptive voice-quality control based on one-to-many eigenvoice conversion. Proc. INTERSPEECH, pp. 2158-2161, Chiba, Japan, Sep. 2010. [Paper]
Y. Shiga, T. Toda, S. Sakai, H. Kawai, K. Tokuda, M. Tsuzaki, S. Nakamura. The NICT Blizzard Challenge 2010 entry. Proc. Blizzard Challenge 2010 Workshop, 6 pages, Kyoto, Japan, Sep. 2010. [Paper]
C. Hayashida, T. Toda, Y. Ohtani, H. Saruwatari, K. Shikano. Linear transformation approaches to many-to-one voice conversion. Proc. 7th ISCA Speech Synthesis Workshop (SSW7), pp. 74-79, Kyoto, Japan, Sep. 2010. [Paper]
H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Statistical approach to enhancing esophageal speech based on Gaussian mixture models. Proc. IEEE ICASSP, pp. 4250-4253, Dallas, USA, Mar. 2010. <Best Student Paper Award (1st Place) (recipients: Hironori Doi and Keigo Nakamura)>[Paper]
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Non-parallel training for many-to-many eigenvoice conversion. Proc. IEEE ICASSP, pp. 4822-4825, Dallas, USA, Mar. 2010. [Paper]
H. Zen, K. Oura, T. Nose, J. Yamagishi, S. Sako, T. Toda, T. Masuko, A.W. Black, K. Tokuda. Recent development of the HMM-based speech synthesis system (HTS). Proc. APSIPA ASC, pp. 121-130, Sapporo, Japan, Oct. 2009 (Invited Talk in Special Session). [Paper]
H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Enhancement of esophageal speech using statistical voice conversion. Proc. APSIPA ASC, pp. 805-808, Sapporo, Japan, Oct. 2009. [Paper]
T. Toda, K. Nakamura, T. Nagai, T. Kaino, Y. Nakajima, K. Shikano. Technologies for processing body-conducted speech detected with non-audible murmur microphone. Proc. INTERSPEECH, pp. 632-635, Brighton, UK, Sep. 2009 (Keynote in Special Session). [Paper]
V.-A. Tran, G. Bailly, H. Loevenbruck, T. Toda. Multimodal HMM-based NAM-to-speech conversion. Proc. INTERSPEECH, pp. 656-659, Brighton, UK, Sep. 2009. [Paper]
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Electrolaryngeal speech enhancement based on statistical voice conversion. Proc. INTERSPEECH, pp. 1431-1434, Brighton, UK, Sep. 2009. [Paper]
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Many-to-many eigenvoice conversion with reference voice. Proc. INTERSPEECH, pp. 1623-1626, Brighton, UK, Sep. 2009. [Paper]
M. Charlier, Y. Ohtani, T. Toda, A. Moinet, T. Dutoit. Cross-language voice conversion based on eigenvoices. Proc. INTERSPEECH, pp. 1635-1638, Brighton, UK, Sep. 2009. [Paper]
R. Maia, T. Toda, K. Tokuda, S. Sakai, S. Nakamura. A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis. Proc. INTERSPEECH, pp. 1783-1786, Brighton, UK, Sep. 2009. [Paper]
R. Maia, T. Toda, S. Sakai, Y. Shiga, J. Ni, H. Kawai, K. Tokuda, M. Tsuzaki, S. Nakamura. The NICT entry for the Blizzard Challenge 2009: an enhanced HMM-based speech synthesis system with trajectory training considering global variance and state-dependent mixed excitation. Proc. Blizzard Challenge 2009 Workshop, 6 pages, Edinburgh, UK, Sep. 2009. [Paper]
T. Toda. Eigenvoice-based approach to voice conversion and voice quality control. Proc. NCMMSC, International Symposium, pp. 492-497, Lanzhou, China, Aug. 2009 (Invited Talk in Special Session). [Paper]
K. Morizane, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Emphasized speech synthesis based on hidden Markov models. Proc. Oriental COCOSDA, 6 pages, O2-4, Beijing, China, Aug. 2009. [Paper]
T. Toda, K. Nakamura, H. Sekimoto, K. Shikano. Voice conversion for various types of body transmitted speech. Proc. IEEE ICASSP, pp. 3601-3604, Taipei, Taiwan, Apr. 2009 (Invited Talk in Special Session). [Paper]
K. Yu, T. Toda, M. Gasic, S. Keizer, F Mairesse, B. Thomson, S. Young. Probabilistic modelling of F0 in unvoiced regions in HMM based speech synthesis. Proc. IEEE ICASSP, pp. 3773-3776, Taipei, Taiwan, Apr. 2009. [Paper]
D. Miyamoto, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Acoustic compensation methods for body transmitted speech conversion. Proc. IEEE ICASSP, pp. 3901-3904, Taipei, Taiwan, Apr. 2009. [Paper]
T. Toda, S. Young. Trajectory training considering global variance for HMM-based speech synthesis. Proc. IEEE ICASSP, pp. 4025-4028, Taipei, Taiwan, Apr. 2009. [Paper]
K. Oura, Y. Nankaku, T. Toda, K. Tokuda, R. Maia, S. Sakai, S. Nakamura. Simultaneous phrasing, prosody, and acoustic model training for Text-to-Speech conversion. Proc. ISCSLP, pp. 1-4, Kunming, China, Dec. 2008. <Best Student Paper Award (recipient: Keiichiro Oura)> [Paper]
K. Yutani, Y. Uto, Y. Nankaku, T. Toda, K. Tokuda. Simultaneous conversion of duration and spectrum based on statistical models including time-sequence matching. Proc. INTERSPEECH, pp. 1072-1075, Brisbane, Australia, Sep. 2008. [Paper]
T. Muramatsu, Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory. Proc. INTERSPEECH, pp. 1076-1079, Brisbane, Australia, Sep. 2008. [Paper]
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. An improved one-to-many eigenvoice conversion system. Proc. INTERSPEECH, pp. 1080-1083, Brisbane, Australia, Sep. 2008. [Paper]
D. Tani, T. Toda, Y. Ohtani, H. Saruwatari, K. Shikano. Maximum a posteriori adaptation for many-to-one eigenvoice conversion. Proc. INTERSPEECH, pp. 1461-1464, Brisbane, Australia, Sep. 2008. [Paper]
K. Nakamura, T. Toda, Y. Nakajima, H. Saruwatari, K. Shikano. Evaluation of speaking-aid system with voice conversion for laryngectomees toward its use in practical environments. Proc. INTERSPEECH, pp. 2209-2212, Brisbane, Australia, Sep. 2008. [Paper]
R. Maia, J. Ni, S. Sakai, T. Toda, K. Tokuda, T. Shimizu, S. Nakamura. The NICT/ATR speech synthesis system for the Blizzard Challenge 2008. Proc. Blizzard Challenge 2008 Workshop, 6 pages, Brisbane, Australia, Sep. 2008. [Paper]
J. Yamagishi, H. Zen, Y.-J. Wu, T. Toda, K. Tokuda. The HTS-2008 system: yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge. Proc. Blizzard Challenge 2008 Workshop, 6 pages, Brisbane, Australia, Sep. 2008. [Paper]
V.-A. Tran, G. Bailly, H. Loevenbruck, T. Toda. Predicting F0 and voicing from NAM-captured whispered speech. Proc. Speech Prosody, 4 pages, Campinas, Brazil, May 2008. [Paper]
T. Toda, K. Tokuda. Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM. Proc. IEEE ICASSP, pp. 3925-3928, Las Vegas, USA, Apr. 2008. [Paper]
J. Yamagishi, T. Nose, H. Zen, T. Toda, K. Tokuda. Performance evaluation of the speaker-independent HMM-based speech synthesis system ``HTS-2007'' for the Blizzard Challenge 2007. Proc. IEEE ICASSP, pp. 3957-3960, Las Vegas, USA, Apr. 2008. [Paper]
R. Maia, T. Toda, K. Tokuda, S. Sakai, S. Nakamura. On the state definition for a trainable excitation model in HMM-based speech synthesis. Proc. IEEE ICASSP, pp. 3965-3968, Las Vegas, USA, Apr. 2008. [Paper]
W. Fujitsuru, H. Sekimoto, T. Toda, H. Saruwatari, K. Shikano. Bandwidth extension of cellular phone speech based on maximum likelihood estimation with GMM. Proc. NCSP, pp. 283-286, Gold Coast, Australia, Mar. 2008. [Paper]
R. Gomez, T. Toda, H. Saruwatari, K. Shikano. Rapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection. Proc. INTERSPEECH, pp. 262-265, Antwerp, Belgium, Aug. 2007. [Paper]
T. Cincarek, I. Shindo, T. Toda, H. Saruwatari, K. Shikano. Development of preschool children subsystem for ASR and Q&A in a real-environment speech-oriented guidance task. Proc. INTERSPEECH, pp. 1469-1472, Antwerp, Belgium, Aug. 2007. [Paper]
R. Maia, T. Toda, H. Zen, Y. Nankaku, K. Tokuda. A trainable excitation model for HMM-based speech synthesis. Proc. INTERSPEECH, pp. 1909-1912, Antwerp, Belgium, Aug. 2007. [Paper]
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Speaker adaptive training for one-to-many eigenvoice conversion based on Gaussian mixture model. Proc. INTERSPEECH, pp. 1981-1984, Antwerp, Belgium, Aug. 2007. [Paper]
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Impact of various small sound source signals on voice conversion accuracy in speech communication aid for laryngectomees. Proc. INTERSPEECH, pp. 2517-2520, Antwerp, Belgium, Aug. 2007. [Paper]
J. Ni, T. Hirai, H. Kawai, T. Toda, K. Tokuda, M. Tsuzaki, S. Sakai, R. Maia, S. Nakamura. ATRECSS - ATR English speech corpus for speech synthesis. Proc. Blizzard Challenge 2007 Workshop, 4 pages, Bonn, Germany, Aug. 2007. [Paper]
J. Yamagishi, H. Zen, T. Toda, K. Tokuda. Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard Challenge 2007. Proc. Blizzard Challenge 2007 Workshop, 6 pages, Bonn, Germany, Aug. 2007. [Paper]
S. Sakai, J. Ni, R. Maia, K. Tokuda, M. Tsuzaki, T. Toda, H. Kawai, S. Nakamura. Communicative speech synthesis with XIMERA. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 28-33, Bonn, Germany, Aug. 2007. [Paper]
K. Ohta, Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Regression approaches to voice quality control based on one-to-many eigenvoice conversion. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 101-106, Bonn, Germany, Aug. 2007. [Paper]
D. Tani, Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. An evaluation of many-to-one voice conversion algorithms with pre-stored speaker data sets. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 107-112, Bonn, Germany, Aug. 2007. [Paper]
J. Yamagishi, T. Kobayashi, S. Renals, S. King, H. Zen, T. Toda, K. Tokuda. Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 125-130, Bonn, Germany, Aug. 2007. [Paper]
R. Maia, T. Toda, H. Zen, Y. Nankaku, K. Tokuda. Excitation model for HMM-based speech synthesis based on residual modeling. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 131-136, Bonn, Germany, Aug. 2007. [Paper]
Y. Nankaku, K. Nakamura, T. Toda, K. Tokuda. Spectral conversion based on statistical models including time-sequence matching. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 333-338, Bonn, Germany, Aug. 2007. [Paper]
T. Toda, Y. Ohtani, K. Shikano. One-to-many and many-to-one voice conversion based on eigenvoices. Proc. IEEE ICASSP, pp. 1249-1252, Hawaii, USA, Apr. 2007 (Invited Talk in Special Session). [Paper]
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech. Proc. INTERSPEECH, pp. 1395-1398, Pittsburgh, USA, Sep. 2006. [Paper]
T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Acoustic modeling for spoken dialogue systems based on unsupervised utterance-based selective training. Proc. INTERSPEECH, pp. 1722-1725, Pittsburgh, USA, Sep. 2006. [Paper]
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation. Proc. INTERSPEECH, pp. 2266-2269, Pittsburgh, USA, Sep. 2006. [Paper]
M. Nakagiri, T. Toda, H. Kashioka, K. Shikano. Improving body transmitted unvoiced speech with statistical voice conversion. Proc. INTERSPEECH, pp. 2270-2273, Pittsburgh, USA, Sep. 2006. [Paper]
Y. Uto, Y. Nankaku, T. Toda, A. Lee, K. Tokuda. Voice conversion based on mixtures of factor analyzers. Proc. INTERSPEECH, pp. 2278-2281, Pittsburgh, USA, Sep. 2006. [Paper]
T. Toda, Y. Ohtani, K. Shikano. Eigenvoice conversion based on Gaussian mixture model. Proc. INTERSPEECH, pp. 2446-2449, Pittsburgh, USA, Sep. 2006. [Paper]
H. Zen, T. Toda, K. Tokuda. The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. Proc. Blizzard Challenge 2006 Workshop, 4 pages, Pittsburgh, USA, Sep. 2006. [Paper]
T. Toda, H. Kawai, T. Hirai, J. Ni, N. Nishizawa, J. Yamagishi, M. Tsuzaki, K. Tokuda, S. Nakamura. Developing a test bed of English Text-to-Speech system XIMERA for the Blizzard Challenge 2006. Proc. Blizzard Challenge 2006 Workshop, 4 pages, Pittsburgh, USA, Sep. 2006. [Paper]
T. Kato, T. Toda, H. Saruwatari, K. Shikano. Transcription cost reduction for constructing acoustic models using acoustic likelihood selection criteria. Proc. LREC2006, pp. 789-792, Genoa, Italy, May. 2006. [Paper]
T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Utterance-based selective training for cost-effective task-adaptation of acoustic models. Proc. SRIV2006, pp. 71-76, Toulouse, France, May. 2006. [Paper]
K. Nakamura, T. Toda, Y. Nankaku, K. Tokuda. On the use of phonetic information for mapping from articulatory movements to vocal tract spectrum. Proc. IEEE ICASSP, pp. 93-96, Toulouse, France, May. 2006. [Paper]
R. Gomez, T. Toda, H. Saruwatari, K. Shikano. Improving rapid unsupervised speaker adaptation based on HMM sufficient statistics. Proc. IEEE ICASSP, pp. 1001-1004, Toulouse, France, May. 2006. [Paper]
T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Selective EM training of acoustic models based on sufficient statistics of single utterances. Proc. IEEE ASRU, pp. 168-173, San Juan, Puerto Rico, Nov. 2005. [Paper]
H. Zen, T. Toda. An overview of Nitech HMM-Based speech synthesis system for Blizzard Challenge 2005. Proc. INTERSPEECH, pp. 93-96, Lisbon, Portugal, Sep. 2005. [Paper]
T. Toda, K. Shikano. NAM-to-speech conversion with Gaussian mixture models. Proc. INTERSPEECH, pp. 1957-1960, Lisbon, Portugal, Sep. 2005. [Paper]
T. Toda, K. Tokuda. Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. Proc. INTERSPEECH, pp. 2801-2804, Lisbon, Portugal, Sep. 2005. [Paper]
T. Toda, A.W. Black, K. Tokuda. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter. Proc. IEEE ICASSP, Vol. 1, pp. 9-12, Philadelphia, USA, Mar 2005. [Paper]
T. Toda, A.W. Black, K. Tokuda. Acoustic-to-articulatory inversion mapping with Gaussian mixture model. Proc. INTERSPEECH, pp. 1129-1132, Jeju, Korea, Oct. 2004. [Paper]
T. Toda, A.W. Black, K. Tokuda. Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis. Proc. 5th ISCA Speech Synthesis Workshop (SSW5), pp. 31-36, Pittsburgh, USA, June 2004. [Paper]
H. Kawai, T. Toda, J. Ni, M. Tsuzaki, K. Tokuda. XIMERA: a new TTS from ATR based on corpus-based technologies. Proc. 5th ISCA Speech Synthesis Workshop (SSW5), pp. 179-184, Pittsburgh, USA, June 2004. [Paper]
K. Adachi, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano. Perceptual evaluation of quality deterioration owing to prosody modification. Proc. LREC2004, pp. 2159-2162, Lisbon, Portugal, May 2004. [Paper]
T. Toda, H. Kawai, M. Tsuzaki. Optimizing sub-cost functions for segment selection based on perceptual evaluations in concatenative speech synthesis. Proc. IEEE ICASSP, pp. 657-660, Montreal, Canada, May 2004. [Paper]
H. Kawai, T. Toda. An evaluation of automatic phone segmentation for concatenative speech synthesis. Proc. IEEE ICASSP, pp. 677-680, Montreal, Canada, May 2004. [Paper]
T. Toda, H. Kawai, M. Tsuzaki. Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations. Proc. INTERSPEECH, pp. 297-300, Geneva, Switzerland, Sep. 2003. [Paper]
T. Shiraishi, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano. Simple designing methods of corpus-based visual speech synthesis. Proc. INTERSPEECH, pp. 2241-2244, Geneva, Switzerland, Sep. 2003. [Paper]
H. Kawanami, Y. Iwami, T. Toda, H. Saruwatari, K. Shikano. GMM-based voice conversion applied to emotional speech synthesis. Proc. INTERSPEECH, pp. 2401-2404, Geneva, Switzerland, Sep. 2003. [Paper]
T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. Segment selection considering local degradation of naturalness in concatenative speech synthesis. Proc. IEEE ICASSP, pp. 696-699, Hong Kong, China, Apr. 2003. [Paper]
M. Mashimo, T. Toda, H. Kawanami, H. Kashioka, K. Shikano, N. Campbell. Evaluation of cross-language voice conversion using bilingual and non-bilingual databases. Proc. INTERSPEECH, pp. 293-296, Denver, USA, Sep. 2002. [Paper]
H. Kawanami, T. Masuda, T. Toda, K. Shikano. Designing Japanese speech database covering wide range in prosody. Proc. INTERSPEECH, pp. 2425-2428, Denver, USA, Sep. 2002. [Paper]
T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. Perceptual evaluation of cost for segment selection in concatenative speech synthesis. Proc. IEEE 2002 Workshop on Speech Synthesis, 4 pages, Santa Monica, USA, Sep. 2002. [Paper]
H. Kawanami, T. Masuda, T. Toda, K. Shikano. Designing speech database with prosodic variety for expressive TTS system. Proc. LREC2002, pp. 2039-2042, Las Palmas, Spain, May 2002. [Paper]
T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. Unit selection algorithm for Japanese speech synthesis based on both phoneme unit and diphone unit. Proc. IEEE ICASSP, pp. 465-468, Orlando, USA, May 2002. [Paper]
T. Toda, H. Saruwatari, K. Shikano. High quality voice conversion based on Gaussian mixture model with dynamic frequency warping. Proc. INTERSPEECH, pp. 349-352, Aalborg, Denmark, Sep. 2001. [Paper]
M. Mashimo, T. Toda, K. Shikano, N. Campbell. Evaluation of cross-language voice conversion based on GMM and STRAIGHT. Proc. INTERSPEECH, pp. 361-364, Aalborg, Denmark, Sep. 2001. [Paper]
T. Toda, H. Saruwatari, K. Shikano. Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT wpectrum. Proc. IEEE ICASSP, pp. 841-844, Salt Lake City, USA, May 2001. [Paper]
T. Toda, J. Lu, H. Saruwatari, K. Shikano. STRAIGHT-based voice conversion algorithm based on Gaussian mixture model. Proc. INTERSPEECH, pp. 279-282, Beijing, China, Oct. 2000. [Paper]
T. Toda, J. Lu, S. Nakamura, K. Shikano. Voice conversion algorithm based on Gaussian mixture model applied to STRAIGHT. Proc. WESTPRAC VII, pp. 169-172, Kumamoto, Japan, Oct. 2000. [Paper]
Review Papers or Book Chapters
E. Cooper, W.-C. Huang, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. A review on subjective and objective evaluation of synthetic speech. Acoustical Science and Technology,Vol. 45, No. 4, pp. 161-183, July 2024. [Paper]
K. Miyazaki, T. Toda, T. Hayashi, K. Takeda. Environmental sound processing and its applications. IEEJ Transactions on Electronics, Information and Systems, Vol. 14, No. 3, pp. 340-351, Mar. 2019. [Paper]
K. Vijayan, H. Li, T. Toda. Speech-to-singing voice conversion: the challenges and strategies for improving vocal conversion processes. IEEE Signal Processing Magazine, Vol. 36, No. 1, pp. 95-102, Jan. 2019. [Link]
K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, K. Oura. Speech synthesis based on hidden Markov models. Proceedings of the IEEE, Vol. 101, No. 5, pp. 1234-1252, May 2013. [Link]
T. Toda. Modeling of speech parameter sequence considering global variance for HMM-based speech synthesis. Hidden Markov Models, Theory and Applications, Editor: Przemyslaw Dymarski, InTech, pp. 131-150, Apr. 2011 (ISBN 978-953-307-208-1). [Paper]
Invited Talks and Tutorials
T. Toda. Voice conversion techniques to separately control static and dynamic speech characteristics. Frontier Forum on Intelligent Speech Analysis and Generation, University of Science and Technology of China, Hefei, China, July 2024 (Invited Talk).
T. Toda. Challenges in leveraging large models for augmented speech production. RASDAP, TCSDAP, Suzhou, China, Apr. 2024 (Invited Talk).
T. Toda. Interactive voice conversion for augmented speech production. SNL, July 2021 (Invited Talk).
T. Toda. Recent progress on voice conversion: what is next? IEEE SLT, Jan. 2021 (Invited Talk).
T. Toda. Recent trend of voice conversion research and its possible future direction. APSIPA Distinguished Lecture in ROCLING (32nd Annual Conference on Computational Linguistics and Speech Processing), Taipei, Taiwan, Sep. 2020 (Keynote).
T. Toda. Speech waveform modeling for advanced voice conversion. APSIPA Distinguished Lecture in Winter Seminar Series on Human Language Technology, National University of Singapore, Singapore, Dec. 2019.
T. Toda. Speech waveform modeling for advanced voice conversion. APSIPA Distinguished Lecture, Carnegie Mellon University, Pittsburgh, USA, Oct. 2019.
T. Toda, K. Kobayashi, T. Hayashi. Statistical voice conversion with direct waveform modeling. INTERSPEECH 2019, Graz, Austria, Sep. 2019 (Tutorial).
T. Toda. Advanced voice conversion. Speech Processing Courses in Crete (SPCC), University of Crete, Heraklion, Greece, July 2019 (Invited Lecture).
T. Toda. Hands on voice conversion. Speech Processing Courses in Crete (SPCC), University of Crete, Heraklion, Greece, July 2019 (Invited Lecture).
T. Toda. Augmented vocal production towards new singing style development. Dagstuhl Seminar, Stimulus Talk at Seminar 19052: computational methods for melody and voice processing in music recordings, Wadern, Germany, Jan. 2019 (Invited Talk).
T. Toda. Advanced voice conversion. Speech Processing Courses in Crete (SPCC), University of Crete, Heraklion, Greece, July 2018 (Invited Lecture).
T. Toda. Hands on voice conversion. Speech Processing Courses in Crete (SPCC), University of Crete, Heraklion, Greece, July 2018 (Invited Lecture).
T. Toda. Statistical voice conversion and its application to augmented speech production, Talk at FRIIS Seminar, Frontier Research Institute for Information Science, Nagoya Institute of Technology, Aichi, Japan, Nov. 2016 (Invited Talk).
T. Toda. Voice conversion. Winter School on Speech and Audio Processing (WiSSAP 2013), IIT Madras, Chennai, India, Feb. 2013 (Invited Lecture).
T. Toda. Statistical voice conversion and its real-time applications. Workshop on Frontiers in Speech and Language Technologies and Their Applications, University of Science and Technology of China, Hefei, China, Dec. 2012 (Invited Talk).
T. Toda. Statistical approach to voice conversion and its applications for augmented human communication. The 8th International Symposium on Chinese Spoken Language Processing (ISCSLP-2012), Hong Kong, China, Dec. 2012 (Tutorial).
T. Toda. General concepts and framework of HMM-based speech synthesis. Tutorial on HMM-based statistical speech synthesis in Workshop, Shanghai Jiao Tong University, Shanghai, China, Oct. 2012 (Tutorial).
T. Toda. Voice conversion for enhancing various types of body-conducted speech detected with non-audible murmur microphone. Joint Meeting: 159th Meeting of the ASA and NOISE-CON 2010, Baltimore, USA, Apr. 2010 (Invited Talk).
T. Toda. Statistical conversion of speech parameter trajectory for mapping between features of different modalities. Acoustics'08 Paris (the 2nd ASA-EAA joint conference), Paris, France, July 2008 (Invited Talk).
T. Toda. Overview of voice conversion. 5th ISCA Speech Synthesis Workshop (SSW5), Pittsburgh, U.S.A., June 2004 (Tutorial).
Others
T. Fujimura, I. Kuroyanagi, T. Toda. The NU systems for DCASE 2024 Challenge Task 2. Technical report, DCASE Task 2, 5 pages, July 2024.
R. Yoneyama, Y.-C. Wu, T. Toda, High-fidelity and pitch-controllable neural vocoder based on unified source-filter networks. IEEE ICASSP, Presentation of an SPS Journal Paper, Apr. 2024.
T. Fujimura, I. Kuroyanagi, T. Hayashi, T. Toda. Anomalous sound detection by end-to-end training of outlier exposure and normalizing flow with domain generalization techniques. Technical report, DCASE Task 2, 5 pages, July 2023.
W.-C. Huang, S.-W. Yang, T. Hayashi, T. Toda, "A comparative study of self-supervised speech representation based voice conversion. IEEE ICASSP, Presentation of an SPS Journal Paper, Rhodes island, Greece, June 2023.
Y. Yasuda, T. Toda. Investigation of Japanese Png BERT language model in text-to-speech synthesis for pitch accent language. IEEE ICASSP, Presentation of an SPS Journal Paper, Rhodes island, Greece, June 2023.
I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Two-stage anomalous sound detection systems using domain generalization and specialization techniques. Technical report, DCASE Task 2, 5 pages, July 2022. <DCASE 2022 Challenge Task 2 Judges' Award>
I. Kuroyanagi, T. Hayashi, Y. Adachi, T. Yoshimura, K. Takeda, T. Toda. Anomalous sound detection with ensemble of autoencoder and binary classification approaches. Technical report, DCASE Task 2, 4 pages, July 2021.
C.-H. Hu, Y.-C. Wu, W.-C. Huang, Y.-H. Peng, Y.-W. Chen, P.-J. Ku, T. Toda, Y. Tsao, H.-M. Wang. The AS-NU system for the M2VoC challenge. Technical report, arXiv:2104.03009, 5 pages, Apr. 2021.
K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Convolution-augmented Transformer for semi-supervised sound event detection. Technical report, DCASE Task 4, 4 pages, June 2020.
S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Post-filter using modulation spectrum as a metric to quantify qver-smoothing effects in statistical parametric speech synthesis. APSIPA newsletter, No. 9, pp. 14-16, Apr. 2015.
Y. Stylianou, T. Toda, C.-H. Wu, A. Kain, O. Rosec. Introduction to the special section on voice transformation. IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, No. 5, pp. 909-911, July 2010.
T. Toda. Voice conversion and its application in speech-to-speech translation. Asian Forum on Information and Communications Technology (AFICT), Kuala Lumpur, Malaysia, Dec. 2009.
T. Toda. Voice conversion (spectral conversion). Lecture, Speech: Phonetics, prosody, perception and synthesis, Language Technologies Institute, Carnegie Mellon University, Pittsburgh, U.S.A., Apr. 2004.
T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. Optimizing segment selection for high-quality Text-to-Speech. ATR Technical Report, TR-SLT-0033, Unpublished report, Mar. 2003.
H. Kawanami, Y. Iwami, T. Toda, K. Shikano. Synthesizing emotional speech using voice conversion technique based on GMM with DFW and its evaluation. Demo presentation, IEEE 2002 Workshop on Speech Synthesis, Santa Monica, U.S.A., Sep. 2002.