J. He, N. Sawada, K. Miyazaki, T. Toda. PARCO: phoneme-augmented robust contextual ASR via contrastive entity disambiguation. Proc. IEEE ASRU, pp. ***-***, Honolulu, USA, Dec. 2025 (Accepted).
E. Cooper, T. Okamoto, Y. Ohtani, T. Toda, H. Kawai. Layer-wise analysis for quality of multilingual synthesized speech. Proc. IEEE ASRU, pp. ***-***, Honolulu, USA, Dec. 2025 (Accepted).
W.-C. Huang, H. Wang, C. Liu, Y.-C. Wu, A. Tjandra, W.-N. Hsu, E. Cooper, Y. Qin, T. Toda. The AudioMOS Challenge 2025. Proc. IEEE ASRU, Challenge Paper, pp. ***-***, Honolulu, USA, Dec. 2025 (Accepted).
Y. Ohtani, T. Okamoto, T. Toda, H. Kawai. Voice factor control using FIR-based fast neural vocoder for speech generation applications . Proc. IEEE ASRU, Demo Paper, pp. ***-***, Honolulu, USA, Dec. 2025 (Accepted).
K. Mizukami, D. Deguchi, T. Toda, H. Murase, H. Kyutoku, T. Minematsu. Study on automatic generation of lecture videos based on content analysis of lecture slides. Proc. CELDA, pp. ***-***, Porto, Portugal, Nov. 2025 (Accepted).
L.P. Violeta, W.-C. Huang, T. Toda. Serenade: a singing style conversion framework based on audio infilling. Proc. EUSIPCO, pp. ***-***, Palermo, Italy, Sep. 2025 (Accepted).
K. Ogita, R. Yoneyama, W.-C. Huang, T. Toda. VAE-SiFiGAN: source-filter HiFi-GAN based on variational autoencoder representations with enhanced pitch controllability. Proc. EUSIPCO, pp. ***-***, Palermo, Italy, Sep. 2025 (Accepted).
Y. Yasuda, J. Yamagishi, T. Toda. Continual subjective evaluation method of speech by merging sort-based preference tests towards ever-expanding corpus of human ratings. Proc. SSW, pp. ***-***, Leeuwarden, the Netherlands, Aug. 2025 (Accepted).
T. Ogura, T. Okamoto, Y. Ohtani, E. Cooper, T. Toda, H. Kawai. GST-BERT-TTS: prosody prediction without accentual labels for multi-speaker TTS using BERT with global style tokens. Proc. INTERSPEECH, pp. 444-448, Rotterdam, the Netherlands, Aug. 2025.
X. Shi, X, Li, T. Toda. Who, When, and What: leveraging the "Three Ws" concept for emotion recognition in conversation. Proc. INTERSPEECH, pp. 1763-1767, Rotterdam, the Netherlands, Aug. 2025.
W.-C. Huang, E. Cooper, T. Toda. SHEET: a multi-purpose open-source speech human evaluation estimation toolkits. Proc. INTERSPEECH, pp. 2355-2359, Rotterdam, the Netherlands, Aug. 2025.
J. He, N. Sawada, K. Miyazaki, T. Toda. CMT-LLM: context-aware multi-talker ASR utilizing large language models. Proc. INTERSPEECH, pp. 2575-2579, Rotterdam, the Netherlands, Aug. 2025.
J. He, J. Mi, T. Toda. GIA-MIC: multimodal emotion recognition with gated interactive attention and modality-invariant learning constraints. Proc. INTERSPEECH, pp. 2695-2699, Rotterdam, the Netherlands, Aug. 2025.
B. Halpern, T. Tienkamp, T. Rebernik, R. van Son, M. Wieling, D. Abur, T. Toda. Relationship between objective and subjective perceptual measures of speech in individuals with head and neck cancer. Proc. INTERSPEECH, pp. 3733-3737, Rotterdam, the Netherlands, Aug. 2025.
X. Shi, X. Li, T. Toda. Speaker-aware multi-task learning for speech emotion recognition. Proc. INTERSPEECH, pp. 4333-4337, Rotterdam, the Netherlands, Aug. 2025.
X. Shi, J. Mi, X. Li, T. Toda. Advancing emotion recognition via ensemble learning: integrating speech, context, and text representations. Proc. INTERSPEECH, pp. 4693-4697, Rotterdam, the Netherlands, Aug. 2025.
R. Yoneyama, M. Kawamura, R. Terashima, R. Yamamoto, T. Toda. Comparative analysis of fast and high-fidelity neural vocoders for low-latency streaming synthesis in resource-constrained environments. Proc. INTERSPEECH, pp. 4888-4892, Rotterdam, the Netherlands, Aug. 2025.
C.-H. Hu, Y. Yasuda, A. Yoshimoto, T. Toda. Unifying listener scoring scales: comparison learning framework for speech quality assessment and continuous speech emotion recognition. Proc. INTERSPEECH, pp. 5428-5432, Rotterdam, the Netherlands, Aug. 2025.
M. Murata, K. Miyazaki, T. Koriyama, T. Toda. Eigenvoice synthesis based on model editing for speaker generation. Proc. INTERSPEECH, pp. 5523-5527, Rotterdam, the Netherlands, Aug. 2025.
D. Ma, J. Mi, F. Li, L.P. Violeta, K. Kobayashi, T. Toda. Improving electrolaryngeal speech enhancement via a representation learning method based on integrated text and speech representations. Proc. IEEE EMBC, 6 pages, Copenhagen, Denmark, July 2025.【3rd Place Award in EMBC 2025 Student Paper Competition(受賞者:Ding Ma)】
Y. Hashizume, T. Toda. Investigation of perceptual music similarity focusing on each instrumental part. Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
T. Fujimura, I. Kuroyanagi, T. Toda. Improvements of discriminative feature space training for anomalous sound detection in unlabeled conditions. Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
K. Nishizawa, R. Yamamoto, W.-C. Huang, T. Toda. Investigating factors related to the naturalness of synthesized unison singing. Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
T. Ogura, T. Okamoto, Y. Ohtani, E. Cooper, T. Toda, H. Kawai. Mora-level prosody prediction for text-to-speech using Japanese BERT without accentual labels. Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
X. Shi, Y. Gao, J. He, J. Mi, X. Li, T. Toda. A study on multimodal fusion and layer adapter in emotion recognition. Proc. APSIPA ASC, 6 pages, Macau, China, Dec. 2024.
T. Imamura, Y. Hashizume, T. Toda. Multi-task learning approaches for music similarity representation learning based on individual instrument sounds. Proc. APSIPA ASC, 6 pages, Macau, China, Dec. 2024.
Z. Yang, J. He, T. Toda. Multi-modal video summarization based on two-stage fusion of audio, visual, and recognized text information. Proc. APSIPA ASC, 6 pages, Macau, China, Dec. 2024.
J. Mi, S. Kim, T. Toda. Improved architecture for high-resolution piano transcription to efficiently capture acoustic characteristics of music signals. Proc. APSIPA ASC, 6 pages, Macau, China, Dec. 2024.
J. Mi, X. Shi, D. Ma, J. He, T. Fujimura, T. Toda. Two-stage framework for robust speech emotion recognition using target speaker extraction in human speech noise conditions. Proc. APSIPA ASC, 6 pages, Macau, China, Dec. 2024.
B. Halpern, T. Toda. Reference-free automatic speech severity evaluation using acoustic unit language modelling. Proc. SpandLDeteriorate Workshop of ACM Multimedia Asia (Workshop on Multi-Biological Sensing Data for Speech and Language Deterioration Prediction), 5 pages, Auckland, New Zealand, Dec. 2024.【Best Paper Award】
Y. Zhang, Y. Zang, J. Shi, R. Yamamoto, T. Toda, Z. Duan. SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge. Proc. IEEE SLT, pp. 792-797, Macau, China, Dec. 2024.
W.-C. Huang, S.-W. Fu, E. Cooper, R. Zezario, T. Toda, H.-M. Wang, J. Yamagishi, Y. Tsao. The VoiceMOS Challenge 2024: beyond speech quality prediction. Proc. IEEE SLT, pp. 813-820, Macau, China, Dec. 2024.
T. Okamoto, Y. Ohtani, S. Shimizu, T. Toda, H. Kawai. Challenge of singing voice synthesis using only text-to-speech corpus with FIRNet Source-Filter Neural Vocoder. Proc. INTERSPEECH, pp. 1870-1874, Kos Island, Greece, Sep. 2024.
C.-H. Hu, Y. Yasuda, T. Toda. Embedding learning for preference-based speech quality assessment. Proc. INTERSPEECH, pp. 2685-2689, Kos Island, Greece, Sep. 2024.
B. Halpern, T. Tienkamp, W.-C. Huang, L.P. Violeta, T. Rebernik, S. de Visscher, M.J.H. Witjes, M. Wieling, D. Abur, T. Toda. Quantifying the effect of speech pathology on automatic and human speaker verification. Proc. INTERSPEECH, pp. 3015-3019, Kos Island, Greece, Sep. 2024.
X. Shi, X. LI, T. Toda. Multimodal fusion of music theory-inspired and self-supervised representations for improved emotion recognition. Proc. INTERSPEECH, pp. 3724-3728, Kos Island, Greece, Sep. 2024.
S. Chen, T. Toda. QHM-GAN: neural vocoder based on quasi-harmonic modeling. Proc. INTERSPEECH, pp. 3889-3893, Kos Island, Greece, Sep. 2024.
J. Feng, Y. Yasuda, T. Toda. Exploring the robustness of text-to-speech synthesis based on diffusion probabilistic models to heavily noisy transcriptions. Proc. INTERSPEECH, pp. 4408-4412, Kos Island, Greece, Sep. 2024.
Zang, J. Shi, Y. Zhang, R. Yamamoto, J. Han, Y. Tang, S. Xu, W. Zhao, J. Guo, T. Toda, Z. Duan CtrSVDD: a benchmark dataset and baseline analysis for controlled singing voice deepfake detection. Proc. INTERSPEECH, pp. 4783-4787, Kos Island, Greece, Sep. 2024.
J. He, T. Toda. 2DP-2MRC: 2-dimensional pointer-based machine reading comprehension method for multimodal moment retrieval. Proc. INTERSPEECH, pp. 5073-5077, Kos Island, Greece, Sep. 2024.
J. Wang, T. Toda. Unsupervised training of neural network-based virtual microphone estimator. Proc. EUSIPCO, pp. 256-260, Lyon, France, Aug. 2024.
T. Fujimura, K. Imoto, T. Toda. Discriminative neighborhood smoothing for generative anomalous sound detection. Proc. EUSIPCO, pp. 156-160 Lyon, France, Aug. 2024.
D. Ma, Y. Choi, F. Li, C. Xie, K. Kobayashi, T. Toda. Robust sequence-to-sequence voice conversion for electrolaryngeal speech enhancement in noisy and reverberant conditions. Proc. IEEE EMBC, 4 pages, Orlando, USA, July 2024.
F. Li, F. Shen, D. Ma, S. Zhang, J. Zhou, L. Wang, F. Fan, T. Liu, X. Chen, T. Toda, H. Niu. Mandarin speech reconstruction from tongue motion ultrasound images based on generative adversarial networks. Proc. IEEE EMBC, 4 pages, Orlando, USA, July 2024.
T. Komatsu, Y. Fujita, K. Takeda, T. Toda. Audio difference learning for audio captioning. Proc. IEEE ICASSP, pp. 1456-1460, Seoul, Korea, Apr. 2024.
Y. Ohtani, T. Okamoto, T. Toda, H. Kawai. FIRNET: fundamental frequency controllable fast neural vocoder with trainable finite impulse response filter. Proc. IEEE ICASSP, pp. 10871-10875, Seoul, Korea, Apr. 2024.
L.P. Violeta, W.-C. Huang, D. Ma, R. Yamamoto, K. Kobayashi, T. Toda. Electrolaryngeal speech intelligibility enhancement through robust linguistic encoders. Proc. IEEE ICASSP, pp. 10961-10965, Seoul, Korea, Apr. 2024.
J. He, X. Shi, X. Li, T. Toda. MF-AED-AEC: speech emotion recognition by leveraging multimodal fusion, ASR error detection, and ASR error correction. Proc. IEEE ICASSP, pp. 11066-11070, Seoul, Korea, Apr. 2024.
T. Okamoto, Y. Ohtani, T. Toda, H. Kawai. ConvNeXt-TTS and ConvNeXt-VC: ConvNeXt-based fast end-to-end sequence-to-sequence text-to-speech and voice conversion. Proc. IEEE ICASSP, pp. 12456-12460, Seoul, Korea, Apr. 2024.
W.-C. Huang, L.P. Violeta, S. Liu, J. Shi, T. Toda. The Singing Voice Conversion Challenge 2023. Proc. IEEE ASRU, 8 pages, Taipei, Taiwan, Dec. 2023.
J. He, Z. Yang, T. Toda. ED-CEC: improving rare word recognition using ASR post-processing based on error detection and context-aware error correction. Proc. IEEE ASRU, 6 pages, Taipei, Taiwan, Dec. 2023.【IEEE名古屋支部国際会議研究発表賞(受賞者:Jiajun He)】
B. Halpern, W.-C. Huang, L.P. Violeta, R. van Son, T. Toda. Improving severity preservation of healthy-to-pathological voice conversion with global style tokens. Proc. IEEE ASRU, 7 pages, Taipei, Taiwan, Dec. 2023.
R. Yamamoto, R. Yoneyama, L.P. Violeta, W.-C. Huang, T. Toda. A comparative study of voice conversion models with large-scale speech and singing data: the T13 systems for the Singing Voice Conversion Challenge 2023. Proc. IEEE ASRU, 6 pages, Taipei, Taiwan, Dec. 2023.
E. Cooper, W.-C. Huang, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. The VoiceMOS Challenge 2023: zero-shot subjective speech quality prediction for multiple domains. Proc. IEEE ASRU, 7 pages, Taipei, Taiwan, Dec. 2023.
T. Okamoto, H. Yamashita, Y. Ohtani, T. Toda, H. Kawai. WaveNeXt: ConvNeXt-based fast neural vocoder without iSTFT layer. Proc. IEEE ASRU, 8 pages, Taipei, Taiwan, Dec. 2023.
S. Kim, K. Takeda, T. Toda. Sequence-to-sequence network training methods for automatic guitar transcription with tokenized outputs. Proc. ISMIR, pp. 524-531, Milan, Italy, Nov. 2023.
W.-C. Huang, T. Toda. Evaluating methods for ground-truth-free foreign accent conversion. Proc. APSIPA ASC, pp. 1136-1141, Taipei, Taiwan, Nov. 2023.
L.P. Violeta, T. Toda. An analysis of personalized speech recognition system development for the deaf and hard-of-hearing. Proc. APSIPA ASC, pp. 1851-1856, Taipei, Taiwan, Nov. 2023.
J. Tian, D. Hu, X. Shi, J. He, X. Li, Y. Gao, T. Toda, X. Xu, X. Hu. Semi-supervised multimodal emotion recognition with consensus decision-making and label correction. Proc. 1st International Workshop on Multimodal and Responsible Affective Computing (MRAC), pp. 67-73, Ottawa, Canada, Oct. 2023.
A. Miyashita, T. Toda. Differentiable representation of warping based on Lie group theory. Proc. IEEE WASPAA, 5 pages, New Paltz, USA, Oct. 2023.【IEEE WASPAA 2023 Best Student Paper Award (受賞者:Atsushi Miyashita)】
R. Wang, T. Toda. Directional target speaker extraction under noisy underdetermined conditions through conditional variational autoencoder with global style tokens. Proc. IEEE WASPAA, 5 pages, New Paltz, USA, Oct. 2023.
S. Luan, Y. Wakabayashi, T. Toda. Sound field interpolation with unsupervised calibration for freely spaced circular microphone array in rotation-robust beamforming. Proc. EUSIPCO, pp. 21-25, Helsinki, Finland, Sep. 2023.
C.H. Hu, Y. Yasuda, T. Toda. Preference-based training framework for automatic speech quality assessment using deep neural network. Proc. INTERSPEECH, pp. 546-550, Dublin, Ireland, Aug. 2023.
X. Shi, X. Li, T. Toda. Emotion awareness in multi-utterance turn for improving emotion prediction in multi-speaker conversation. Proc. INTERSPEECH, pp. 765-769, Dublin, Ireland, Aug. 2023.
T. Okamoto, H. Yamashita, T. Toda, H. Kawai. E2E-S2S-VC: end-to-end sequence-to-sequence voice conversion. Proc. INTERSPEECH, pp. 2043-2047, Dublin, Ireland, Aug. 2023.
Y. Choi, C. Xie, T. Toda. Reverberation-controllable voice conversion using reverberation time estimator. Proc. INTERSPEECH, pp. 2103-2107, Dublin, Ireland, Aug. 2023.
Y. Yasuda, T. Toda. Analysis of mean opinion scores in subjective evaluation of synthetic speech based on tail probabilities. Proc. INTERSPEECH, pp. 5491-5495, Dublin, Ireland, Aug. 2023.
Y. Yasuda, T. Toda. Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder. Proc. IEEE ICASSP, 5 pages, Rhodes island, Greece, June 2023.
K. Kobayashi, T. Hayashi, T. Toda. Low-latency electrolaryngeal speech enhancement based on FastSpeech2-based voice conversion and self-supervised speech representation. Proc. IEEE ICASSP, 5 pages, Rhodes island, Greece, June 2023.
R. Yamamoto, R. Yoneyama, T. Toda. NNSVS: a neural network based singing voice synthesis toolkit. Proc. IEEE ICASSP, 5 pages, Rhodes island, Greece, June 2023.
R. Yoneyama, Y.-C. Wu, T. Toda. Source-Filter HiFiGAN: fast and pitch controllable high-fidelity neural vocoder. Proc. IEEE ICASSP, 5 pages, Rhodes island, Greece, June 2023.【IEEE Signal Processing Society Japan Student Conference Paper Award(受賞者:Reo Yoneyama)】
L.P. Violeta, D. Ma, W.-C. Huang, T. Toda. Intermediate fine-tuning using imperfect synthetic speech for improving electrolaryngeal speech recognition. Proc. IEEE ICASSP, 5 pages, Rhodes island, Greece, June 2023.
T. Fujimura, T. Toda. Analysis of Noisy-target Training for DNN-based speech enhancement. Proc. IEEE ICASSP, 5 pages, Rhodes island, Greece, June 2023.
A. Miyashita, T. Toda. Representation of vocal tract length transformation based on group theory. Proc. IEEE ICASSP, 5 pages, Rhodes island, Greece, June 2023.
D. Ma, L.P. Violeta, K. Kobayashi, T. Toda. Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion. Proc. IEEE SLT, pp. 949-954, Doha, Qatar, Jan. 2023.
Y. Hashizume, L. Li, T. Toda. Music similarity calculation of individual instrumental sounds using metric learning. Proc. APSIPA ASC, pp. 33-38, Chiang Mai, Thailand, Nov. 2022.
J. Feng, T. Yoshikawa, T. Toda. Interpretable control for emotional text-to-speech system toward development of sympathetic educational-support robots. Proc. APSIPA ASC, pp. 342-346, Chiang Mai, Thailand, Nov. 2022.
R. Wang, L. Li, T. Toda. Direction-aware target speaker extraction with a dual-channel system based on conditional variational autoencoders under underdetermined conditions. Proc. APSIPA ASC, pp. 347-353, Chiang Mai, Thailand, Nov. 2022.
S. Chen, T. Toda. Sequence-wise optimization for quasi-harmonic speech waveform modeling. Proc. APSIPA ASC, pp. 1658-1663, Chiang Mai, Thailand, Nov. 2022.
I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Improvement of anomalous sound detection method considering the distribution of embedding. Proc. ICA, ABS-0189, 5 pages, Gyeongju, Korea, Oct. 2022 (Invited Talk in Structured Session).
C. Xie, T. Toda. Noisy-to-noisy voice conversion with pre-training strategy. Proc. ICA, ABS-0801, 5 pages, Gyeongju, Korea, Oct. 2022 (Invited Talk in Structured Session).
L.P. Violeta, W.-C. Huang, T. Toda. Investigating self-supervised pretraining frameworks for pathological speech recognition. Proc. INTERSPEECH, pp. 41-45, Incheon, Korea, Sep. 2022.
R. Yoneyama, Y.-C. Wu, T. Toda. Unified source-filter GAN with harmonic-plus-noise source excitation generation. Proc. INTERSPEECH, pp. 848-852, Incheon, Korea, Sep. 2022.
W.-C. Huang, E. Cooper, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. The VoiceMOS Challenge 2022. Proc. INTERSPEECH, pp. 4536-4540, Incheon, Korea, Sep. 2022.
D. Yoshioka, Y. Yaduda, N. Matsunaga, Y. Ohtani, T. Toda. Spoken-text-style transfer with conditional variational autoencoder and content word storage. Proc. INTERSPEECH, pp. 4576-4580, Incheon, Korea, Sep. 2022.
Y. Choi, C. Xie, T. Toda. An evaluation of three-stage voice conversion framework for noisy and reverberant conditions. Proc. INTERSPEECH, pp. 4910-4914, Incheon, Korea, Sep. 2022.
S. Kim, T. Hayashi, T. Toda. Note-level automatic guitar transcription using attention mechanism. Proc. EUSIPCO, pp. 229-233, Belgrade, Serbia, Aug.-Sep. 2022.
I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Improvement of serial approach to anomalous sound detection by incorporating two binary cross-entropies for outlier exposure. Proc. EUSIPCO, pp. 294-298, Belgrade, Serbia, Aug.-Sep. 2022.
S. Luan, Y. Wakabayashi, T. Toda. Modified sound field interpolation method for rotation-robust beamforming with unequally spaced circular microphone array. Proc. EUSIPCO, pp. 344-348, Belgrade, Serbia, Aug.-Sep. 2022.
W.-C. Huang, E. Cooper, J. Yamagishi, T. Toda. LDNet: unified listener dependent modeling in MOS prediction for synthetic speech. Proc. IEEE ICASSP, pp. 896-900, May 2022.
W.-C. Huang, S.-W. Yang, T. Hayashi, H.-Y. Lee, S. Watanabe, T. Toda. S3PRL-VC: open-source voice conversion framework with self-supervised speech representations. Proc. IEEE ICASSP, pp. 6552-6556, May 2022.
W.-C. Huang, B.M Halpern, L.P. Violeta, O. Scharenborg, T. Toda. Towards identity preserving normal to dysarthric voice conversion. Proc. IEEE ICASSP, pp. 6672-6676, May 2022.
C. Xie, Y-.C. Wu, P.L. Tobing, W-.C. Huang, T. Toda. Direct noisy speech modeling for noisy-to-noisy voice conversion. Proc. IEEE ICASSP, pp. 6787-6791, May 2022.
T. Hayashi, K. Kobayashi, T. Toda. An investigation of streaming non-autoregressive sequence-to-sequence voice conversion. Proc. IEEE ICASSP, pp. 6802-6806, May 2022.
E. Cooper, W.-C. Huang, T. Toda, J. Yamagishi. Generalization ability of MOS prediction networks. Proc. IEEE ICASSP, pp. 8442-8446, May 2022.
W.-C. Huang, S.-W. Yang, T. Hayashi, H.-Y. Lee, S. Watanabe, T. Toda. S3PRL-VC: open-source voice conversion framework with self-supervised speech representations. Proc. AAAI-22 Workshop, W35: Self-Supervised Learning for Audio and Speech Processing, 5 pages, Feb. 2022.
Z. Qian, H. Niu, L. Wang, K. Kobayashi, S. Zhang, T. Toda. Mandarin electro-laryngeal speech enhancement based on statistical voice conversion and manual tone control. Proc. APSIPA ASC, pp. 546-552, Dec. 2021.
C. Xie, Y.-C. Wu, P.L. Tobing, W.-C. Huang, T. Toda. Noisy-to-noisy voice conversion framework with denoising model. Proc. APSIPA ASC, pp. 814-820, Dec. 2021.
D. Ma, W.-C. Huang, T. Toda. Investigation of text-to-speech-based synthetic parallel data for sequence-to-sequence non-parallel voice conversion. Proc. APSIPA ASC, pp. 870-877, Dec. 2021.【APSIPA ASC 2021 The Best Paper Award】
Y.-S. Liou, W.-C. Huang, M.-C. Yen, S.-W. Tsai, Y.-H. Peng, T. Toda, Y. Tsao, H.-M. Wang. Time alignment using lip images for frame-based electrolaryngeal voice conversion. Proc. APSIPA ASC, pp. 1234-1238, Dec. 2021.
T. Okamoto, T. Toda, H. Kawai. Multi-stream HiFi-GAN with data-driven waveform decomposition. Proc. IEEE ASRU, pp. 610-617, Dec. 2021.
W.-C. Huang, T. Hayashi, X. Li, S. Watanabe, T. Toda. On prosody modeling for ASR+TTS based voice conversion," . Proc. IEEE ASRU, pp. 642-649, Dec. 2021.
M.-C. Yen, W.-C. Huang, K. Kobayashi, Y.-H. Peng, S.-W. Tasi, Y. Tsao, T. Toda, J.-S. R. Jang, H.-M. Wang. Mandarin electrolaryngeal speech voice conversion with sequence-to-sequence modeling. Proc. IEEE ASRU, pp. 650-657, Dec. 2021.
H.-T. Chiang, Y.-C. Wu, C. Yu, T. Toda, H.-M. Wang, Y.-C. Hu, Y. Tsao. HASA-Net: a non-intrusive hearing-aid speech assessment network. Proc. IEEE ASRU, pp. 907-913, Dec. 2021
I. Kuroyanagi, T. Hayashi, Y. Adachi, T. Yoshimura, K. Takeda, T. Toda. An ensemble approach to anomalous sound detection based on conformer-based autoencoder and binary classifier incorporated with metric learning. Proc. DCASE 2021 Workshop, pp. 110-114, Nov. 2021.
S. Seki, H. Taga, T. Toda. Singing fundamental frequency contour generation using generalized command response model and score-conditional variational autoencoder. Proc. IEEE MLSP, 6 pages, Oct. 2021.
W.-C. Huang, K. Kobayashi, Y.-H. Peng, C.-F. Liu, Y. Tsao, H.-M. Wang, T. Toda. A preliminary study of a two-stage paradigm for preserving speaker identity in dysarthric voice conversion. Proc. INTERSPEECH, pp. 1329-1333, Aug.-Sep. 2021.
R. Yoneyama, Y.-C. Wu, T. Toda. Unified source-filter GAN: unified source-filter network based on factorization of quasi-periodic parallel WaveGAN. Proc. INTERSPEECH, pp. 2187-2191, Aug.-Sep. 2021.
P.L. Tobing, T. Toda. High-fidelity and low-latency universal neural vocoder based on multiband WaveRNN with data-driven linear prediction for discrete waveform modeling. Proc. INTERSPEECH, pp. 2217-2221, Aug.-Sep. 2021.
Y.-C. Wu, C.-H. Hu, H.-S. Lee, Y.-H. Peng, W.-C. Huang, Y. Tsao, H.-M. Wang, T. Toda. Relational data selection for data augmentation of speaker-dependent multi-band MelGAN vocoder. Proc. INTERSPEECH, pp. 3630-3634, Aug.-Sep. 2021.
P.L. Tobing, T. Toda. Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction. Proc. 11th ISCA Speech Synthesis Workshop (SSW11) , pp. 142-147, Aug. 2021.
I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Anomalous sound detection using a binary classification model and class centroids. Proc. EUSIPCO, pp. 1995-1999, Aug. 2021.
K. Kobayashi, W.-C. Huang, Y.-C. Wu, S. P.L. Tobing, T. Hayashi, T. Toda. Crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder. Proc. IEEE ICASSP, pp. 5934-5938, June 2021.
W.-C. Huang, Y.-C. Wu, T. Hayashi, T. Toda. Any-to-one sequence-to-sequence voice conversion using self-supervised discrete speech representations. Proc. IEEE ICASSP, pp. 5944-5948, June 2021.
T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Noise level limited sub-modeling for diffusion probabilistic vocoders. Proc. IEEE ICASSP, pp. 6029-6033, June 2021.
A. Ando, R. Masumura, H. Sato, T. Moriya, T. Ashihara, Y. Ijima, T. Toda. Speech emotion recognition based on listener adaptive models. Proc. IEEE ICASSP, pp. 6274-6278, June 2021.
K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, Y. Shiga, H. Kawai. High-intelligibility speech synthesis for dysarthric speakers with LPCNet-based TTS and CycleVAE-based VC. Proc. IEEE ICASSP, pp. 7058-7062, June 2021.
T. Hayashi, W.-C. Huang, K. Kobayashi, T. Toda. Non-autoregressive sequence-to-sequence voice conversion. Proc. IEEE ICASSP, pp. 7068-6072, June 2021.
W.-C. Huang, C.-H. Wu, S.-B. Luo, K.-Y. Chen, H.-M. Wang, T. Toda. Speech recognition by simply fine-tuning BERT. Proc. IEEE ICASSP, pp. 7343-7347, June 2021.
H. Nakatani, P.L. Tobing, K. Takeda, T. Toda. Cross-lingual voice conversion using cyclic variational auto-encoder and a WaveNet vocoder. Proc. APSIPA ASC, pp. 520-526, Dec. 2020.
M. Eshghi, K. Kobayashi, K. Tanaka, H. Kameoka, T. Toda. Phoneme embeddings on predicting fundamental frequency pattern for electrolaryngeal speech. Proc. APSIPA ASC, pp. 572-577, Dec. 2020.
K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Conformer-based sound event detection with semi-supervised learning and data augmentation. Proc. DCASE 2020 Workshop, pp. 100-104, Nov. 2020.
Z. Yi, W.-C. Huang, X. Tian, J. Yamagishi, R.K. Das, T. Kinnunen, Z. Ling, T. Toda. Voice Conversion Challenge 2020 –- intra-lingual semi-parallel and cross-lingual voice conversion –-. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 80-98, Oct. 2020.
R.K. Das, T. Kinnunen, W.-C. Huang, Z. Ling, J. Yamagishi, Z. Yi, X. Tian, T. Toda. Predictions of subjective ratings and spoofing assessments of Voice Conversion Challenge 2020 submissions. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 99-120, Oct. 2020.
P.L. Tobing, Y.-C. Wu, T. Toda. Baseline system of Voice Conversion Challenge 2020 with cyclic variational autoencoder and parallel WaveGAN. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 155-159, Oct. 2020.
W.-C. Huang, T. Hayashi, S. Watanabe, T. Toda. The sequence-to-sequence baseline for the Voice Conversion Challenge 2020: cascading ASR and TTS. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 160-164, Oct. 2020.
W.-C. Huang, P.L. Tobing, Y.-C. Wu, K. Kobayashi, T. Toda. The NU voice conversion system for the Voice Conversion Challenge 2020: on the effectiveness of sequence-to-sequence models and autoregressive neural vocoders. Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 165-169, Oct. 2020.
Y.-C. Wu, T. Hayashi, T. Okamoto, H. Kawai, T. Toda. Quasi-periodic parallel WaveGAN vocoder: a non-autoregressive pitch-dependent dilated convolution model for parametric speech generation. Proc. INTERSPEECH, pp. 3535-3539, Oct. 2020.
Y.-C. Wu, P.L. Tobing, K. Yasuhara, N. Matsunaga, Y. Ohtani, T. Toda. A cyclical post-filtering approach to mismatch refinement of neural vocoder for text-to-speech systems. Proc. INTERSPEECH, pp. 3540-3544, Oct. 2020.
S. Seki, M. Takada, T. Toda. Semi-supervised self-produced speech enhancement and suppression based on joint source modeling of air- and body-conducted signals using variational autoencoder. Proc. INTERSPEECH, pp. 4039-4043, Oct. 2020.
S. Hikosaka, S. Seki, T. Hayashi, K. Kobayashi, K. Takeda, H. Banno, T. Toda. Intelligibility enhancement based on speech waveform modification using hearing impairment simulator. Proc. INTERSPEECH, pp. 4059-4063, Oct. 2020.
W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, T. Toda. Voice transformer network: sequence-to-sequence voice conversion using transformer with text-to-speech pretraining. Proc. INTERSPEECH, pp. 4676-4680, Oct. 2020.
P.L. Tobing, T. Hayashi, Y.-C. Wu, K. Kobayashi, T. Toda. Cyclic spectral modeling for unsupervised unit discovery into voice conversion with excitation and waveform modeling. Proc. INTERSPEECH, pp. 4861-4865, Oct. 2020.
K. Kobayashi, T. Toda. Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN. Proc. EUSIPCO, pp. 396-400, Aug. 2020.
M. Takada, S. Seki, P.L. Tobing, T. Toda. Semi-supervised enhancement and suppression of self-produced speech using correspondence between air- and body-conducted signals. Proc. EUSIPCO, pp. 456-460, Aug. 2020.
K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Weakly-supervised sound event detection with self-attention. Proc. IEEE ICASSP, pp. 66-70, May 2020.
T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Transformer-based text-to-speech with weighted forced attention. Proc. IEEE ICASSP, pp. 6729-6733, May 2020.
P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda. Efficient shallow WaveNet vocoder using multiple samples output based on Laplacian distribution and linear prediction. Proc. IEEE ICASSP, pp. 7204-7208, May 2020.
T. Hayashi, R. Yamamoto, K. Inoue, T. Yoshimura, S. Watanabe, T. Toda, K. Takeda, Y. Zhang, X. Tan. ESPNET-TTS: Uunified, reproducible, and integratable open source end-to-end text-to-speech toolkit. Proc. IEEE ICASSP, pp. 7654-7658, May 2020.
P.L. Tobing, T. Hayashi, T. Toda. Investigation of shallow WaveNet vocoder with Laplacian distribution output. Proc. IEEE ASRU, pp. 176-183, Sentosa, Singapore, Dec. 2019.
T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Tacotron-based acoustic model using phoneme alignment for practical neural text-to-speech synthesis. Proc. IEEE ASRU, pp. 214-221, Sentosa, Singapore, Dec. 2019.
L. Li, T. Toda, K. Morikawa, K. Kobayashi, S. Makino. Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE. Proc. ISMIR, pp. 784-790, Delft, the Netherlands, Nov. 2019.
F. Ahmadi, K. Kobayashi, T. Toda. Development of a real-time bionic voice generation system based on statistical excitation prediction. Proc. ACM ASSETS, pp. 655-657, Posters and Demos, Pittsburgh, USA, Oct. 2019.
W.-C. Huang, Y.-C. Wu, K. Kobayashi, Y.-H. Peng, H.-T. Hwang, P.L. Tobing, Y. Tsao, H.-M. Wang, T. Toda. Generalization of spectrum differential based direct waveform modification for voice conversion. Proc. 10th ISCA Speech Synthesis Workshop (SSW10), pp. 57-62, Vienna, Austria, Sep. 2019.
Y.-C. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda. Statistical voice conversion with quasi-periodic WaveNet vocoder. Proc. 10th ISCA Speech Synthesis Workshop (SSW10), pp. 63-68, Vienna, Austria, Sep. 2019.
M. Eshghi, K. Tanaka, K. Kobayashi, H. Kameoka, T. Toda. An investigation of features for fundamental frequency pattern prediction in electrolaryngeal speech enhancement. Proc. 10th ISCA Speech Synthesis Workshop (SSW10), pp. 251-256, Vienna, Austria, Sep. 2019.
Y.-C. Wu, T. Hayashi, P.L. Tobing, K. Kobayashi, T. Toda. Quasi-periodic WaveNet vocoder: a pitch dependent dilated convolution model for parametric speech generation. Proc. INTERSPEECH, pp. 196-200, Graz, Austria, Sep. 2019.
P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda. Non-parallel voice conversion with cyclic variational autoencoder. Proc. INTERSPEECH, pp. 674-678, Graz, Austria, Sep. 2019.
Y. Kurita, K. Kobayashi, K. Takeda, T. Toda. Robustness of statistical voice conversion based on direct waveform modification against background sounds. Proc. INTERSPEECH, pp. 684-688, Graz, Austria, Sep. 2019.
W.-C. Huang, Y.-C. Wu, C.-C. Lo, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda, Y. Tsao, H.-M. Wang. Investigation of F0 conditioning and fully convolutional networks in variational autoencoder based voice conversion. Proc. INTERSPEECH, pp. 709-713, Graz, Austria, Sep. 2019.
T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Real-time neural text-to-speech with sequence-to-sequence acoustic model and WaveGlow or single Gaussian WaveRNN vocoders. Proc. INTERSPEECH, pp. 1308-1312, Graz, Austria, Sep. 2019.
T. Hayashi, S. Watanabe, T. Toda, K. Takeda, S. Toshniwal, K. Livescu. Pre-trained text embeddings for enhanced text-to-speech synthesis. Proc. INTERSPEECH, pp. 4430-4434, Graz, Austria, Sep. 2019.
S. Seki, H. Kameoka, L. Li, T. Toda, K. Takeda. Generalized multichannel variational autoencoder for underdetermined source separation. Proc. EUSIPCO, 5 pages, A Coruna, Spain, Sep. 2019.
W.-C. Huang, Y.-C. Wu, H.-T. Hwang, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda, Y. Tsao, H.-M. Wang. Refined WaveNet vocoder for variational autoencoder based voice conversion. Proc. EUSIPCO, 5 pages, A Coruna, Spain, Sep. 2019.
T. Komatsu, T. Hayashi, R. Kondo, T. Toda, K. Takeda. Scene-dependent anomalous acoustic-event detection based on conditional WaveNet and i-Vector. Proc. IEEE ICASSP, pp. 870-874, Brighton, UK, May 2019.
P.L. Tobing, Y. Wu, T. Hayashi, K. Kobayashi, T. Toda. Voice conversion with cyclic recurrent neural network and fine-tuned WaveNet vocoder. Proc. IEEE ICASSP, pp. 6815-6819, Brighton, UK, May 2019.
T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features. Proc. IEEE ICASSP, pp. 7020-7024, Brighton, UK, May 2019.
P.L. Tobing, T. Hayashi, Y. Wu, K. Kobayashi, T. Toda. An evaluation of deep spectral mappings and WaveNet vocoder for voice conversion. Proc. IEEE SLT, pp. 297-303, Athens, Greece, Dec. 2018.
T. Okamoto, T. Toda, Y. Shiga, H. Kawai. Improving FFTNet vocoder with noise shaping and subband approaches. Proc. IEEE SLT, pp. 304-311, Athens, Greece, Dec. 2018.
T. Hayashi, S. Watanabe, Y. Zhang, T. Toda, T. Hori, R. Astudillo, K. Takeda. Back-translation-style data augmentation for end-to-end ASR. Proc. IEEE SLT, pp. 426-433, Athens, Greece, Dec. 2018.
M. Takada, S. Seki, T. Toda. Self-produced speech enhancement and suppression method using air- and body-conductive microphones. Proc. APSIPA ASC, pp. 1240-1245, Hawaii, USA, Nov. 2018.
K. Miyazaki, T. Hayashi, T. Toda, K. Takeda. Connectionist temporal classification-based sound event encoder for converting sound events into onomatopoeia representations. Proc. EUSIPCO, pp. 857-861, Rome, Italy, Sep. 2018.
K. Kobayashi, T. Toda. Electrolarygeal speech enhancement with statistical voice conversion based on CLDNN. Proc. EUSIPCO, pp. 2129-2133, Rome, Italy, Sep. 2018.
T. Hayashi, T. Komatsu, R. Kondo, T. Toda, K. Takeda. Anomalous sound event detection based on WaveNet. Proc. EUSIPCO, pp. 2508-2512, Rome, Italy, Sep. 2018.
T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Multi-Head Decoder for end-to-end speech recognition. Proc. INTERSPEECH, pp. 801-805, Hyderabad, India, Sep. 2018.
Y. Wu, K. Kobayashi, T. Hayashi, P.L. Tobing, T. Toda. Collapsed segment detection and reduction for WaveNet vocoder. Proc. INTERSPEECH, pp. 1998-1992, Hyderabad, India, Sep. 2018.
H. Kawahara, K. Sakakibara, M. Morise, H. Banno, T. Toda, T. Irino. Frequency domain variants of velvet noise and their application to speech processing and synthesis. Proc. INTERSPEECH, pp. 2027-2031, Hyderabad, India, Sep. 2018.
S. Tamura, K. Horio, H. Endo, S. Hayamizu, T. Toda. Audio-visual voice conversion using deep canonical correlation analysis for deep bottleneck features. Proc. INTERSPEECH, pp. 2469-2473, Hyderabad, India, Sep. 2018.
F. Ahmadi, T. Toda. Designing a pneumatic bionic voice prosthesis - statistical approach for source excitation generation. Proc. INTERSPEECH, pp. 3142-3146, Hyderabad, India, Sep. 2018.
T. Kinnunen, J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, Z. Ling. A spoofing benchmark for the 2018 voice conversion challenge: leveraging from spoofing countermeasures for speech artifact assessment. Proc. Odyssey 2018, pp. 187-194, Les Sables d'Olonne, France, June 2018.
J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, T. Kinnunen, Z. Ling. The voice conversion challenge 2018: promoting development of parallel and nonparallel methods. Proc. Odyssey 2018, pp. 195-202, Les Sables d'Olonne, France, June 2018.
K. Kobayashi, T. Toda. sprocket: open-source voice conversion software. Proc. Odyssey 2018, pp. 203-210, Les Sables d'Olonne, France, June 2018.
Y. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda. The NU non-parallel voice conversion system for the voice conversion challenge 2018. Proc. Odyssey 2018, pp. 211-218, Les Sables d'Olonne, France, June 2018.
P.L. Tobing, Y. Wu, T. Hayashi, K. Kobayashi, T. Toda. NU voice conversion system for the voice conversion challenge 2018. Proc. Odyssey 2018, pp. 219-226, Les Sables d'Olonne, France, June 2018.
S. Seiya, R. Ito, K. Okamoto, U. Tanikawa, S. Ohira, D. Deguchi, T. Toda. Development of "KamiRepo" system with automatic student identification to handle handwritten assignments on LMS. Proc. IEEE EDUCON, pp. 841-848, Canary Islands, Spain, Apr. 2018.
T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, H. Kawai. An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features. Proc. IEEE ICASSP, pp. 5654-5658, Calgary, Canada, Apr. 2018.
K. Tachibana, T. Toda, Y. Shiga, H. Kawai. An investigation of noise shaping with perceptual weighting for WaveNet-based speech generation. Proc. IEEE ICASSP, pp. 5664-5668, Calgary, Canada, Apr. 2018.
T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, H. Kawai. Subband WaveNet with overlapped single-sideband filterbanks. Proc. IEEE ASRU, pp. 698-704, Okinawa, Japan, Dec. 2017.
T. Hayashi, A. Tamamori, K. Kobayashi, K. Takeda, T. Toda. An investigation of multi-speaker training for WaveNet vocoder. Proc. IEEE ASRU, pp. 712-718, Okinawa, Japan, Dec. 2017.
K. Morikawa, T. Toda. Electrolaryngeal speech modification towards singing aid system for laryngectomees. Proc. APSIPA ASC, 4 pages, Kuala Lumpur, Malaysia, Dec. 2017.
P.L. Tobing, H. Kameoka, T. Toda. Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling. Proc. APSIPA ASC, 4 pages, Kuala Lumpur, Malaysia, Dec. 2017.
A. Tamamori, T. Hayashi, T. Toda, K. Takeda. Investigation of effectiveness on recurrent neural network for daily activity recognition using multi-modal signals. Proc. APSIPA ASC, 7 pages, Kuala Lumpur, Malaysia, Dec. 2017 (Invited Talk in Special Session).
K. Kubo, K. Kobayashi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An Investigation of how to design control parameters for statistical voice timbre control. Proc. APSIPA ASC, 4 pages, Kuala Lumpur, Malaysia, Dec. 2017.
H. Kawahara, K. Sakakibara, M. Morise, H. Banno, T. Toda. Accurate estimation of fo and aperiodicity based on periodicity detector residuals and deviations of phase derivatives. Proc. APSIPA ASC, 9 pages, Kuala Lumpur, Malaysia, Dec. 2017.
S. Seki, H. Kameoka, T. Toda, K. Takeda. Missing component restoration for masked speech signals based on time-domain spectrogram factorization. Proc. IEEE MLSP, 6 pages, Tokyo, Japan, Sep. 2017.
S. Seki, T. Toda, K. Takeda. Stereophonic music separation based on non-negative tensor factorization with cepstrum regularization. Proc. EUSIPCO, pp. 1011-1015, Kos island, Greece, Aug. 2017.
H. Kawahara, K. Sakakibara, M. Morise, H. Banno, T. Toda. A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and FO estimation. Proc. INTERSPEECH, pp. 424-428, Stockholm, Sweden, Aug. 2017.
K. Tanaka, H. Kameoka, T. Toda, S. Nakamura. Physically constrained statistical F0 prediction for electrolaryngeal speech enhancement. Proc. INTERSPEECH, pp. 1069-1073, Stockholm, Sweden, Aug. 2017.
A. Tamamori, T. Hayashi, K. Kobayashi, K. Takeda, T. Toda. Speaker-dependent WaveNet vocoder. Proc. INTERSPEECH, pp. 1118-1122, Stockholm, Sweden, Aug. 2017.
K. Kobayashi, T. Hayashi, A. Tamamori, T. Toda. Statistical voice conversion with WaveNet-based waveform generation. Proc. INTERSPEECH, pp. 1138-1142, Stockholm, Sweden, Aug. 2017.
H. Kawahara, K. Sakakibara, H. Banno, M. Morise, T. Toda, T. Irino. A new cosine series antialiasing function and its application to aliasing-free glottal source models for speech and singing synthesis. Proc. INTERSPEECH, pp. 1358-1362, Stockholm, Sweden, Aug. 2017.
L. Li, H. Kameoka, T. Toda, S. Makino. Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization. Proc. INTERSPEECH, pp. 1998-2002, Stockholm, Sweden, Aug. 2017.
T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda. BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic sound event detection. Proc. IEEE ICASSP, pp. 766-770, New Orleans, USA, Mar. 2017.
Y. Tajiri, H. Kameoka, T. Toda. A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body-conducted signals. Proc. IEEE ICASSP, pp. 4960-4964, New Orleans, USA, Mar. 2017.
K. Kobayashi, T. Toda, S. Nakamura. F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential. Proc. IEEE SLT, pp. 693-700, San Diego, USA, Dec. 2016.
A. Tamamori, T. Hayashi, T. Toda, K. Takeda. Investigation on recurrent neural network architectures for daily activity recognition. Proc. UV2016, 4 pages, Aichi, Japan, Oct. 2016.
Y. Tajiri, T. Toda. Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring. Proc. 9th ISCA Speech Synthesis Workshop (SSW9), pp. 54-60, Sunnyvale, USA, Sep. 2016.
P.L. Tobing, T. Toda, H. Kameoka, S. Nakamura. Acoustic-to-articulatory inversion mapping based on latent trajectory Gaussian mixture model. Proc. INTERSPEECH, pp. 953-957, San Francisco, USA, Sep. 2016.
T. Toda, L.-H. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, J. Yamagishi. The Voice Conversion Challenge 2016. Proc. INTERSPEECH, pp. 1632-1636, San Francisco, USA, Sep. 2016.
K. Kobayashi, S. Takamichi, S. Nakamura, T. Toda. The NU-NAIST voice conversion system for the Voice Conversion Challenge 2016. Proc. INTERSPEECH, pp. 1667-1671, San Francisco, USA, Sep. 2016.【2017年度C&C若手優秀論文賞 (受賞者:Kazuhiro Kobayashi)】
K. Tachibana, T. Toda, Y. Shiga, H. Kawai. Model integration for HMM- and DNN-based speech synthesis using Product-of-Experts framework. Proc. INTERSPEECH, pp. 2288-2292, San Francisco, USA, Sep. 2016.
Q. Truong Do, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A hybrid system for continuous word-level emphasis modeling based on HMM state clustering and adaptive training. Proc. INTERSPEECH, pp. 3196-3200, San Francisco, USA, Sep. 2016.
T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda. Bidirectional LSTM-HMM hybrid system for polyphonic sound event detection. Proc. DCASE2016 workshop, 5 pages, Budapest, Hungary, Sep. 2016.
K. Tanaka, T. Toda, G. Neubig, S. Nakamura. Real-time vibration control of an electrolarynx based on statistical F0 contour prediction. Proc. EUSIPCO, pp. 1333-1337, Budapest, Hungary, Aug. 2016.
H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Removing noise from event-related potentials using a probabilistic generative model with grouped covariance matrices. Proc. IEEE EMBC, 4 pages, Orlando, USA, Aug. 2016.
S. Yamane, K. Kobayashi, T. Toda, T. Nakano, M. Goto, S. Nakamura. An estimation method of voice timbre evaluation values using feature extraction with Gaussian mixture model based on reference singer. Proc. IEEE ICASSP, pp. 5265-5269, Shanghai, China, Mar. 2016.
K. Tanaka, H. Kameoka, T. Toda, S. Nakamura. Statistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework. Proc. IEEE ICASSP, pp. 5665-5669, Shanghai, China, Mar. 2016.
K. Kobayashi, T. Toda, S. Nakamura. Implementation of F0 transformation for statistical singing voice conversion based on direct waveform modification. Proc. IEEE ICASSP, pp. 5670-5674, Shanghai, China, Mar. 2016.
Y. Tajiri, T. Toda, S. Nakamura. Noise suppression method for body-conducted soft speech enhancement based on external noise monitoring. Proc. IEEE ICASSP, pp. 5935-5939, Shanghai, China, Mar. 2016.
T. Hiraoka, G. Neubig, K. Yoshino, T. Toda, S. Nakamura. Active learning for example-based dialog systems. Proc. IWSDS, 11 pages, Saariselka, Finland, Jan. 2016.
Y. Tsunomori, G. Neubig, T. Hiraoka, M. Mizukami, S. Sakti, T. Toda, S. Nakamura. A dialog system to detect deception. Proc. IWSDS, 6 pages, Saariselka, Finland, Jan. 2016.
S. Sakti, F. Ilham, G. Neubig, T. Toda, Purwarianti, S. Nakamura. Incremental sentence compression using LSTM recurrent networks. Proc. IEEE ASRU, pp. 252-258, Scottsdale, USA, Dec. 2015.
Q. Truong Do, M. Heck, S. Sakti, G. Neubig, T. Toda, S. Nakamura. The NAIST ASR system for the 2015 Multi-Genre Broadcast Challenge: on combination of deep learning systems using a rank-score function. Proc. IEEE ASRU, pp. 654-659, Scottsdale, USA, Dec. 2015.
N. Lubis, S. Sakti, G. Neubig, K. Yoshino, T. Toda, S. Nakamura. A study of social-affective communication: automatic prediction of emotion triggers and responses in television talk shows. Proc. IEEE ASRU, pp. 777-783, Scottsdale, USA, Dec. 2015.
M. Mizukami, H. Kizuki, T. Nomura, G. Neubig, K. Yoshino, S. Sakti, T. Toda, S. Nakamura. Adaptive selection from multiple response candidates in example-based dialogue. Proc. IEEE ASRU, pp. 784-790, Scottsdale, USA, Dec. 2015.
H. Kawahara, K. Sakakibara, H. Banno, M. Morise, T. Toda, T. Irino. Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation. Proc. APSIPA ASC, pp. 520-529, Hong Kong, Dec. 2015.
Q. Truong Do, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Improving translation of emphasis with pause prediction in speech-to-speech translation systems. Proc. IWSLT, pp. 204-208, Da Nang, Vietnam, Dec. 2015.
Y. Oda, H. Fudaba, G. Neubig, H. Hata, S. Sakti, T. Toda, S. Nakamura. Learning to generate pseudo-code from source code using statistical machine translation. Proc. ASE, pp. 574-584, Lincoln, USA, Nov. 2015.
H. Fudaba, Y. Oda, K. Akabe, G. Neubig, H. Hata, S. Sakti, T. Toda, S. Nakamura. Pseudogen: a tool to automatically generate pseudo-code from source code. Proc. ASE, Tool Demos, pp. 824-829, Lincoln, USA, Nov. 2015.
N. Lubis, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Construction and analysis of social-affective interaction corpus in English and Indonesian. Proc. O-COCOSDA, pp. 202-206, Shanghai, China, Oct. 2015.
K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An enhanced electrolarynx with automatic fundamental frequency control based on statistical prediction. Proc. ACM ASSETS, Demonstration paper, pp. 435-436, Lisbon, Portugal, Sep. 2015.
K. Sugiyama, M. Mizukami, G. Neubig, K. Yoshino, S. Sakti, T. Toda, S. Nakamura. An investigation of machine translation evaluation metrics in cross-lingual question answering. Proc. 10th Workshop on Statistical Machine Translation (WMT), pp. 442-449, Lisbon, Portugal, Sep. 2015.
Y. Nishigaki, S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Prosody-controllable HMM-based speech synthesis using speech input. Proc. MLSLP, 5 pages, Fukushima, Japan, Sep. 2015.
S. Takamichi, K. Kobayashi, K. Tanaka, T. Toda, S. Nakamura. The NAIST text-to-speech system for the Blizzard Challenge 2015. Proc. Blizzard Challenge Workshop, 4 pages, Berlin, Germany, Sep. 2015.
Y. Oshima, S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. Proc. INTERSPEECH, pp. 299-303, Dresden, Germany, Sep. 2015.
S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Modulation spectrum-constrained trajectory training algorithm for HMM-based speech synthesis. Proc. INTERSPEECH, pp. 1206-1210, Dresden, Germany, Sep. 2015.
T. Mieno, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Speed or accuracy? a study in evaluation of simultaneous speech translation. Proc. INTERSPEECH, pp. 2267-2271, Dresden, Germany, Sep. 2015.
T.T. Nguyen, G. Neubig, H. Shindo, S. Sakti, T. Toda, S. Nakamura. A latent variable model for joint pause prediction and dependency parsing. Proc. INTERSPEECH, pp. 2719-2723, Dresden, Germany, Sep. 2015.
K. Kobayashi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Statistical singing voice conversion based on direct waveform modification with global variance. Proc. INTERSPEECH, pp. 2754-2758, Dresden, Germany, Sep. 2015.
Y. Tajiri, K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments. Proc. INTERSPEECH, pp. 2769-2773, Dresden, Germany, Sep. 2015.
P.L. Tobing, K. Kobayashi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential. Proc. INTERSPEECH, pp. 3350-3354, Dresden, Germany, Sep. 2015.
D.Q. Truong, S. Takamichi, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs. Proc. INTERSPEECH, pp. 3665-3669, Dresden, Germany, Sep. 2015.
H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Evaluation of EEG ocular artifact removal with a multi-channel wiener filter based on probabilistic generative model. Proc. IEEE EMBC, 4 pages, Milan, Italy, Aug. 2015.
Y. Oda, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Syntax-based simultaneous translation through prediction of unseen syntactic constituents. Proc. ACL, pp. 198-207, Beijing, China, July 2015.
A. Miura, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Improving pivot translation by remembering the pivot. Proc. ACL, pp. 573-577, Beijing, China, July 2015.
Y. Oda, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Ckylark: a more robust PCFG-LA parser. Proc. NAACL HLT, Demo Track, pp. 41-45, Denver, USA, June 2015.
H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. EEG signal enhancement using multichannel Wiener filter with a spatial correlation prior. Proc. IEEE ICASSP, pp. 2639-2643, Brisbane, Australia, Apr. 2015.
S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Parameter generation algorithm considering modulation spectrum for HMM-based speech synthesis. Proc. IEEE ICASSP, pp. 4210-4214, Brisbane, Australia, Apr. 2015.
Z. Wu, A. Khodabakhsh, C. Demiroglu, J. Yamagishi, D. Saito, T. Toda, S. King. SAS: a speaker verification spoofing database containing diverse attacks. Proc. IEEE ICASSP, pp. 4440-4444, Brisbane, Australia, Apr. 2015.
A. Tjandra, S. Sakti, G. Neubig, T. Toda, M. Adriani, S. Nakamura. Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR. Proc. IEEE ICASSP, pp. 4525-4529, Brisbane, Australia, Apr. 2015.
S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Modulation spectrum-constrained trajectory training algorithm for GMM-based voice conversion. Proc. IEEE ICASSP, pp. 4859-4863, Brisbane, Australia, Apr. 2015.
H. Tanaka, S. Sakti, G. Neubig, T. Toda, H. Negoro, H. Iwasaka, S. Nakamura. Automated social skills trainer. Proc. IUI, pp. 17-27, Atlanta, USA, Mar. 2015.
M. Mizukami, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Linguistic individuality transformation for spoken language. Proc. IWSDS, 12 pages, Busan, South Korea, Jan. 2015.
F. Koto, S. Sakti, G. Neubig, T. Toda, M. Adriani, S. Nakamura. A study on natural expressive speech: automatic memorable spoken quote detection. Proc. IWSDS, 6 pages, Busan, South Korea, Jan. 2015.
T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Evaluation of a fully automatic cooperative persuasive dialogue system. Proc. IWSDS, 12 pages, Busan, South Korea, Jan. 2015.
T. Sasakura, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Unknown word detection based on event-related brain desynchronization responses. Proc. IWSDS, 6 pages, Busan, South Korea, Jan. 2015.
Y. Tsunomori, G. Neubig, S. Sakti, T. Toda, S. Nakamura. An analysis towards dialogue-based deception detection. Proc. IWSDS, 11 pages, Busan, South Korea, Jan. 2015.
H. Kawahara, M. Morise, T. Toda, H. Banno, R. Nisimura, T. Irino. Excitation source design for high-quality speech manipulation systems based on a temporally static group delay representation of periodic signals. Proc. APSIPA ASC, 10 pages, Siem Reap, Cambodia, Dec. 2014.
S. Sakti, Y. Odagaki, T. Sasakura, G. Neubig, T. Toda, S. Nakamura. An event-related brain potential study on the impact of speech recognition errors. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014.
S. Tsuruta, K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An evaluation of target speech for a nonaudible murmur enhancement system in noisy environments. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014.
K. Kobayashi, T. Toda, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. Gender-dependent spectrum differential models for perceived age control based on direct waveform modification in singing voice conversion. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014.
L. Nio, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Recursive neural network paraphrase identification for example-based dialog retrieval. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014.
K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An inter-speaker evaluation through simulation of electrolarynx control based on statistical F0 prediction. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014.
R. Yoshida, T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Unnecessary utterance detection for avoiding digressions in discussion. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014.
F. Koto, S. Sakti, G. Neubig, T. Toda, M. Adriani, S. Nakamura. The use of semantic and acoustic features for open-domain TED talk summarization. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014.
S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Modulation spectrum-based post-filter for GMM-based voice conversion. Proc. APSIPA ASC, 4 pages, Siem Reap, Cambodia, Dec. 2014.【APSIPA ASC 2014 The Best Paper Award】
L. Nio, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Improving the robustness of example-based dialog retrieval using recursive neural network paraphrase identification. Proc. IEEE SLT, pp. 306-311, South Lake Tahoe, USA, Dec. 2014.
S. Takamichi, T. Toda, A.W. Black, S. Nakamura. Modified post-filter to recover modulation spectrum for HMM-based speech synthesis. Proc. IEEE GlobalSIP, pp. 710-714, Atlanta, USA, Dec. 2014.
T. Toda. Augmented speech production based on real-time statistical voice conversion. Proc. IEEE GlobalSIP, pp. 755-759, Atlanta, USA, Dec. 2014 (Invited Talk).
Y. Hatakoshi, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Rule-based syntactic preprocessing for syntax-based machine translation. Proc. 8th Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), pp. 34-42, Doha, Qatar, Oct. 2014.
K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Direct F0 control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation. Proc. INTERSPEECH, pp. 31-35, Singapore, Sep. 2014.
N. Jinbo, S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A hearing impairment simulation method using audiogram-based approximation of auditory characteristics. Proc. INTERSPEECH, pp. 490-494, Singapore, Sep. 2014.
K. Kubo, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Structured soft margin confidence weighted learning for grapheme-to-phoneme conversion. Proc. INTERSPEECH, pp. 1263-1267, Singapore, Sep. 2014.
S. Matsumiya, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus. Proc. INTERSPEECH, pp. 1801-1805, Singapore, Sep. 2014.
H. Kawahara, M. Morise, T. Toda, H. Banno, R. Nisimura, T. Irino. Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation. Proc. INTERSPEECH, pp. 2243-2247, Singapore, Sep. 2014.
P.L. Tobing, T. Toda, G. Neubig, S. Sakti, S. Nakamura, A. Purwarianti. Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models. Proc. INTERSPEECH, pp. 2298-2302, Singapore, Sep. 2014.
K. Kobayashi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Statistical singing voice conversion with direct waveform modification based on the spectrum differential. Proc. INTERSPEECH, pp. 2514-2518, Singapore, Sep. 2014.
D.Q. Truong, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Collection and analysis of a Japanese-English emphasized speech corpus. Proc. O-COCOSDA, pp. 77-82, Phuket, Thailand, Sep. 2014.
M. Mizukami, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Building a free, general-domain paraphrase database for Japanese. Proc. O-COCOSDA, pp. 129-133, Phuket, Thailand, Sep. 2014.
F. Koto, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Memorable spoken quote corpora of TED public speaking. Proc. O-COCOSDA, pp. 140-143, Phuket, Thailand, Sep. 2014.
L. Nio, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Conversation dialog corpora from drama television and movie scripts. Proc. O-COCOSDA, pp. 144-148, Phuket, Thailand, Sep. 2014.
K. Akabe, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Discriminative language models as a tool for machine translation error analysis. Proc. COLING, pp. 1124-1132, Dublin, Ireland, Aug. 2014.
T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Reinforcement learning of cooperative persuasive dialogue policies using framing. Proc. COLING, pp. 1706-1717, Dublin, Ireland, Aug. 2014.
Y. Oda, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Optimizing segmentation strategies for simultaneous speech translation. Proc. ACL, pp. 551-556, Baltimore, USA, June 2014.
H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Linguistic and acoustic features for automatic identification of autism spectrum disorders in children's narrative. Proc. Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 88-96, Baltimore, USA, June 2014.
H. Shimizu, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Collection of a simultaneous translation corpus for comperative analysis. Proc. LREC, pp. 670-673, Reykjavik, Iceland, May 2014.
S. Sakti, K. Kubo, S. Matsumiya, G. Neubig, T. Toda, S. Nakamura, F. Adachi, R. Isotani. Towards multilingual conversations in the medical domain: development of multilingual medical data and a network-based ASR system. Proc. LREC, pp. 2639-2643, Reykjavik, Iceland, May 2014.
S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura. A postfilter to modify the modulation spectrum in HMM-based speech synthesis. Proc. IEEE ICASSP, pp. 290-294, Florence, Italy, May 2014.【IEEE Signal Processing Society Japan Outstanding Student Conference Paper Award (受賞者:Shinnosuke Takamichi)】
K. Kubo, S. Sakti, G. Neubig, T. Toda, S. Nakamura. NARROW adaptive regularization of weights for grapheme-to-phoneme conversion. Proc. IEEE ICASSP, pp. 2608-2612, Florence, Italy, May 2014.
K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement. Proc. IEEE ICASSP, pp. 4521-4525, Florence, Italy, May 2014.
K. Kobayashi, T. Toda, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. Regression approaches to perceptual age control in singing voice conversion. Proc. IEEE ICASSP, pp. 7954-7958, Florence, Italy, May 2014.
H.T. Vu, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Acquiring a dictionary of emotion-provoking events. Proc. EACL, pp. 128-132, Gothenburg, Sweden, Apr. 2014.
T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Construction and analysis of a persuasive dialogue corpus. Proc. IWSDS, pp. 213-223, Napa, USA, Jan. 2014.
N. Lubis, S. Sakti, G. Neubig, T. Toda, A. Purwarianti, S. Nakamura. Emotion and its triggers in human spoken dialogue: recognition and analysis. Proc. IWSDS, pp. 224-229, Napa, USA, Jan. 2014.
H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Modality and contextual differences in computer based non-verbal communication training. Proc. CogInfoCom, pp. 127-132, Budapest, Hungary, Dec. 2013.
H. Shimizu, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Constructing a apeech translation system using simultaneous interpretation data. Proc. IWSLT, 7 pages, Heidelberg, Germany, Dec. 2013.
S. Sakti, K. Kubo, G. Neubig, T. Toda, S. Nakamura. The NAIST English speech recognition system for IWSLT 2013. Proc. IWSLT, 5 pages, Heidelberg, Germany, Dec. 2013.
T. Hiraoka, Y. Yamauchi, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Dialogue management for leading the conversation in persuasive dialogue systems. Proc. IEEE ASRU, pp. 114-119, Olomouc, Czech Republic, Dec. 2013.
H. Tanaka, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Non-verbal communication training with an interactive multimedia application. Proc. IEEE ACE, pp. 392-402, Osaka, Japan, Oct. 2013.
Lasguido, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Combination of example-based and SMT-based approaches in a chat-oriented dialog system. Proc. ICE-ID, 6 pages, Bali, Indonesia, Oct. 2013.
G. Neubig, S. Sakti, T. Toda, S. Nakamura, Y. Matsumoto, R. Isotani, Y. Ikeda. Towards high-reliability speech translation in the medical domain. Proc. MedNLP-WS, 8 pages, Aichi, Japan, Oct. 2013.
P. Arthur, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Inter-sentence features and thresholded minimum error rate training: NAIST at CLEF 2013 QA4MRE. Proc. CLEF, 11 pages, Valencia, Spain, Sep. 2013.
T. Toda, H. Doi. Statistical voice conversion techniques for alaryngeal speech enhancement. Proc. SICE 2013, pp. 1602-1603, Aichi, Japan, Sep. 2013 (Invited Talk in Special Session).
T. Inukai, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Investigation of intra-speaker spectral parameter variation and its prediction towards improvement of spectral conversion metric. Proc. 8th ISCA Speech Synthesis Workshop (SSW8), pp. 89-94, Barcelona, Spain, Aug. 2013.
H. Kawahara, M. Morise, T. Toda, R. Nisimura, T. Irino. Beyond bandlimited sampling of speech spectral envelope imposed by the harmonic structure of voiced sounds. Proc. INTERSPEECH, pp. 34-38, Lyon, France, Aug. 2013.
S. Takamichi, T. Toda, Y. Shiga, S. Sakti, G. Neubig, S. Nakamura. Improvements to HMM-based speech synthesis based on parameter generation with rich context models. Proc. INTERSPEECH, pp. 364-368, Lyon, France, Aug. 2013.
K. Kobayashi, H. Doi, T. Toda, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. An investigation of acoustic features for singing voice conversion based on perceptual age. Proc. INTERSPEECH, pp. 1057-1061, Lyon, France, Aug. 2013.
H. Doi, T. Toda, T. Nakano, M. Goto, S. Nakamura. Evaluation of a singing voice conversion method based on many-to-many eigenvoice conversion. Proc. INTERSPEECH, pp. 1067-1071, Lyon, France, Aug. 2013.
K. Kubo, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Grapheme-to-phoneme conversion based on adaptive regularization of weight vectors. Proc. INTERSPEECH, pp. 1946-1950, Lyon, France, Aug. 2013.
T. Kano, S. Takamichi, S. Sakti, G. Neubig, T. Toda, S. Nakamura. Generalizing continuous-space translation of paralinguistic information. Proc. INTERSPEECH, pp. 2614-2618, Lyon, France, Aug. 2013.
M. Ohgushi, G. Neubig, S. Sakti, T. Toda, S. Nakamura. An empirical comparison of joint optimization techniques for speech translation. Proc. INTERSPEECH, pp. 2619-2723, Lyon, France, Aug. 2013.
K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Hybrid approach to electrolaryngeal speech enhancement based on spectral subtraction and statistical voice conversion. Proc. INTERSPEECH, pp. 3067-3071, Lyon, France, Aug. 2013.
T. Moriguchi, T. Toda, M. Sano, H. Sato, G. Neubig, S. Sakti, S. Nakamura. A digital signal processor implementation of silent/electrolaryngeal speech enhancement based on real-time statistical voice conversion. Proc. INTERSPEECH, pp. 3072-3076, Lyon, France, Aug. 2013.
T. Fujita, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Simple, lexicalized choice of translation timing for simultaneous speech translation. Proc. INTERSPEECH, pp. 3487-3491, Lyon, France, Aug. 2013.
M. Itoi, R. Miyazaki, T. Toda, H. Saruwatari, K. Shikano. Blind speech extraction for non-audible murmur speech with speaker's movement noise. Proc. ISSPIT, 6 pages, Ho Chi Minh City, Vietnam, Dec. 2012.
A. Sani, S. Sakti, G. Neubig, T. Toda, A. Mulyanto, S. Nakamura. Towards language preservation: preliminary collection and vowel analysis of Indonesian ethnic speech data. Proc. Oriental COCOSDA, pp. 118-122, Macau, China, Dec. 2012.【Best Student Paper Award (受賞者:Auliya Sani)】
G. Neubig, K. Duh, M. Ogushi, T. Kano, T. Kiso, S. Sakti, T. Toda, S. Nakamura. The NAIST machine translation system for IWSLT 2012. Proc. IWSLT, pp. 54-60, Hong Kong, Dec. 2012.
C. Saam, C. Mohr, K. Kilgour, M. Heck, M. Sperber, K. Kubo, S. Stueker, S. Sakti, G. Neubig, T. Toda, S. Nakamura, A. Waibel. The 2012 KIT and KIT-NAIST English ASR systems for the IWSLT evaluation. Proc. IWSLT, pp. 87-90, Hong Kong, Dec. 2012.
M. Heck, K. Kubo, M. Sperber, S. Sakti, S. Stueker, C. Saam, K. Kilgour, C. Mohr, G. Neubig, T. Toda, S. Nakamura, A. Waibel. The KIT-NAIST (contrastive) English ASR system for IWSLT 2012. Proc. IWSLT, pp. 91-95, Hong Kong, Dec. 2012.
T. Kano, S. Sakti, S. Takamichi, G. Neubig, T. Toda, S. Nakamura. A method for translation of paralinguistic information. Proc. IWSLT, pp. 158-163, Hong Kong, Dec. 2012.
H. Doi, T. Toda, T. Nakano, M. Goto, S. Nakamura. Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system. Proc. APSIPA ASC, 6 pages, Hollywood, USA, Nov. 2012.【APSIPA ASC 2012 The Best Paper Award (Short Paper in Regular Session Category)】
H. Tanaka, S. Sakti, G. Neubig, T. Toda, N. Campbell, S. Nakamura. Non-verbal cognitive skills and autistic conditions: an analysis and training tool. Proc. CogInfoCom, pp. 41-46, Kosice, Slovakia, Dec. 2012.
Lasguido, S. Sakti, G. Neubig, T. Toda, M. Adriani, S. Nakamura. Developing Non-Goal Dialog System based on Examples of Drama Television. Proc. IWSDS, pp. 315-320, Paris, France, Nov. 2012.
M. Kishimoto, T. Toda, H. Doi, S. Sakti, S. Nakamura. Model training using parallel data with mismatched pause positions in statistical esophageal speech enhancement. Proc. ICSP, pp. 590-594, Beijing, China, Oct. 2012 (Invited Talk in Special Session).
T. Toda, T. Muramatsu, H. Banno. Implementation of computationally efficient real-time voice conversion. Proc. INTERSPEECH, 4 pages, Portland, USA, Sep. 2012.
S. Takamichi, T. Toda, Y. Shiga, H. Kawai, S. Sakti, S. Nakamura. An evaluation of parameter generation methods with rich context models in HMM-based speech synthesis. Proc. INTERSPEECH, 4 pages, Portland, USA, Sep. 2012.
T. Toda. Statistical approaches to enhancement of body-conducted speech detected with non-audible murmur microphone. Proc. ICME CME, pp. 623-628, Hyogo, Japan, July 2012 (Invited Poster in Special Session).
K. Yamamoto, T. Toda, H. Doi, H. Saruwatari, K. Shikano. Statistical approach to voice quality control in esophageal speech enhancement. Proc. IEEE ICASSP, pp. 4497-4500, Kyoto, Japan, Mar. 2012.
S. Ishii, T. Toda, H. Saruwatari, S. Sakti, S. Nakamura. Blind noise suppression for non-audible murmur recognition with stereo signal processing. Proc. IEEE ASRU, pp. 494-499, Hawaii, USA, Dec. 2011.
D. Deguchi, T. Toda, H. Doi, H. Saruwatari, K. Shikano. Computationally efficient body-conducted voice conversion with original excitation signals. Proc. APSIPA ASC, 4 pages, Xi'an, China, Oct. 2011.
N. Hattori, T. Toda, Hisashi Kawai, H. Saruwatari, K. Shikano. Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speech-to-speech translation. Proc. INTERSPEECH, pp. 2769-2772, Florence, Italy, Aug. 2011.
H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques. Proc. IEEE ICASSP, pp. 5136-5139, Prague, Czech Republic, May. 2011.
D. Babani, T. Toda, H. Saruwatari, K. Shikano. Acoustic model training for non-audible murmur recognition using transformed normal speech data. Proc. IEEE ICASSP, pp. 5224-5227, Prague, Czech Republic, May. 2011.
H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Speaking-aid systems based on one-to-many eigenvoice conversion for total laryngectomees. Proc. APSIPA ASC, pp. 498-501, Singapore, Dec. 2010.
D. Deguchi, H. Doi, T. Toda, H. Saruwatari, K. Shikano. Acoustic compensation method for accepting different recording devices in body-conducted voice conversion. Proc. APSIPA ASC, pp. 502-505, Singapore, Dec. 2010.
Y. Shiga, T. Toda, S. Sakai, H. Kawai. Improved training of excitation for HMM-based parametric speech synthesis. Proc. INTERSPEECH, pp. 809-812, Chiba, Japan, Sep. 2010.
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion. Proc. INTERSPEECH, pp. 1628-1631, Chiba, Japan, Sep. 2010.
K. Ohta, T. Toda, Y. Ohtani, H. Saruwatari, K. Shikano. Adaptive voice-quality control based on one-to-many eigenvoice conversion. Proc. INTERSPEECH, pp. 2158-2161, Chiba, Japan, Sep. 2010.
Y. Shiga, T. Toda, S. Sakai, H. Kawai, K. Tokuda, M. Tsuzaki, S. Nakamura. The NICT Blizzard Challenge 2010 entry. Proc. Blizzard Challenge 2010 Workshop, 6 pages, Kyoto, Japan, Sep. 2010.
C. Hayashida, T. Toda, Y. Ohtani, H. Saruwatari, K. Shikano. Linear transformation approaches to many-to-one voice conversion. Proc. 7th ISCA Speech Synthesis Workshop (SSW7), pp. 74-79, Kyoto, Japan, Sep. 2010.
H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Statistical approach to enhancing esophageal speech based on Gaussian mixture models. Proc. IEEE ICASSP, pp. 4250-4253, Dallas, USA, Mar. 2010.【Best Student Paper Award (1st Place)(受賞者:Hironori Doi, Keigo Nakamura)】
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Non-parallel training for many-to-many eigenvoice conversion. Proc. IEEE ICASSP, pp. 4822-4825, Dallas, USA, Mar. 2010.
H. Zen, K. Oura, T. Nose, J. Yamagishi, S. Sako, T. Toda, T. Masuko, A.W. Black, K. Tokuda. Recent development of the HMM-based speech synthesis system (HTS). Proc. APSIPA ASC, pp. 121-130, Sapporo, Japan, Oct. 2009 (Invited Talk in Special Session).
H. Doi, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Enhancement of esophageal speech using statistical voice conversion. Proc. APSIPA ASC, pp. 805-808, Sapporo, Japan, Oct. 2009.
T. Toda, K. Nakamura, T. Nagai, T. Kaino, Y. Nakajima, K. Shikano. Technologies for processing body-conducted speech detected with non-audible murmur microphone. Proc. INTERSPEECH, pp. 632-635, Brighton, UK, Sep. 2009 (Keynote in Special Session).
V.-A. Tran, G. Bailly, H. Loevenbruck, T. Toda. Multimodal HMM-based NAM-to-speech conversion. Proc. INTERSPEECH, pp. 656-659, Brighton, UK, Sep. 2009.
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Electrolaryngeal speech enhancement based on statistical voice conversion. Proc. INTERSPEECH, pp. 1431-1434, Brighton, UK, Sep. 2009.
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Many-to-many eigenvoice conversion with reference voice. Proc. INTERSPEECH, pp. 1623-1626, Brighton, UK, Sep. 2009.
M. Charlier, Y. Ohtani, T. Toda, A. Moinet, T. Dutoit. Cross-language voice conversion based on eigenvoices. Proc. INTERSPEECH, pp. 1635-1638, Brighton, UK, Sep. 2009.
R. Maia, T. Toda, K. Tokuda, S. Sakai, S. Nakamura. A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis. Proc. INTERSPEECH, pp. 1783-1786, Brighton, UK, Sep. 2009.
R. Maia, T. Toda, S. Sakai, Y. Shiga, J. Ni, H. Kawai, K. Tokuda, M. Tsuzaki, S. Nakamura. The NICT entry for the Blizzard Challenge 2009: an enhanced HMM-based speech synthesis system with trajectory training considering global variance and state-dependent mixed excitation. Proc. Blizzard Challenge 2009 Workshop, 6 pages, Edinburgh, UK, Sep. 2009.
T. Toda. Eigenvoice-based approach to voice conversion and voice quality control. Proc. NCMMSC, International Symposium, pp. 492-497, Lanzhou, China, Aug. 2009 (Invited Talk in Special Session).
K. Morizane, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Emphasized speech synthesis based on hidden Markov models. Proc. Oriental COCOSDA, 6 pages, O2-4, Beijing, China, Aug. 2009.
T. Toda, K. Nakamura, H. Sekimoto, K. Shikano. Voice conversion for various types of body transmitted speech. Proc. IEEE ICASSP, pp. 3601-3604, Taipei, Taiwan, Apr. 2009 (Invited Talk in Special Session).
K. Yu, T. Toda, M. Gasic, S. Keizer, F Mairesse, B. Thomson, S. Young. Probabilistic modelling of F0 in unvoiced regions in HMM based speech synthesis. Proc. IEEE ICASSP, pp. 3773-3776, Taipei, Taiwan, Apr. 2009.
D. Miyamoto, K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Acoustic compensation methods for body transmitted speech conversion. Proc. IEEE ICASSP, pp. 3901-3904, Taipei, Taiwan, Apr. 2009.
T. Toda, S. Young. Trajectory training considering global variance for HMM-based speech synthesis. Proc. IEEE ICASSP, pp. 4025-4028, Taipei, Taiwan, Apr. 2009.
K. Oura, Y. Nankaku, T. Toda, K. Tokuda, R. Maia, S. Sakai, S. Nakamura. Simultaneous phrasing, prosody, and acoustic model training for Text-to-Speech conversion. Proc. ISCSLP, pp. 1-4, Kunming, China, Dec. 2008.【Best Student Paper Award (受賞者:Keiichiro Oura)】
K. Yutani, Y. Uto, Y. Nankaku, T. Toda, K. Tokuda. Simultaneous conversion of duration and spectrum based on statistical models including time-sequence matching. Proc. INTERSPEECH, pp. 1072-1075, Brisbane, Australia, Sep. 2008.
T. Muramatsu, Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory. Proc. INTERSPEECH, pp. 1076-1079, Brisbane, Australia, Sep. 2008.
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. An improved one-to-many eigenvoice conversion system. Proc. INTERSPEECH, pp. 1080-1083, Brisbane, Australia, Sep. 2008.
D. Tani, T. Toda, Y. Ohtani, H. Saruwatari, K. Shikano. Maximum a posteriori adaptation for many-to-one eigenvoice conversion. Proc. INTERSPEECH, pp. 1461-1464, Brisbane, Australia, Sep. 2008.
K. Nakamura, T. Toda, Y. Nakajima, H. Saruwatari, K. Shikano. Evaluation of speaking-aid system with voice conversion for laryngectomees toward its use in practical environments. Proc. INTERSPEECH, pp. 2209-2212, Brisbane, Australia, Sep. 2008.
R. Maia, J. Ni, S. Sakai, T. Toda, K. Tokuda, T. Shimizu, S. Nakamura. The NICT/ATR speech synthesis system for the Blizzard Challenge 2008. Proc. Blizzard Challenge 2008 Workshop, 6 pages, Brisbane, Australia, Sep. 2008.
J. Yamagishi, H. Zen, Y.-J. Wu, T. Toda, K. Tokuda. The HTS-2008 system: yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge. Proc. Blizzard Challenge 2008 Workshop, 6 pages, Brisbane, Australia, Sep. 2008.
V.-A. Tran, G. Bailly, H. Loevenbruck, T. Toda. Predicting F0 and voicing from NAM-captured whispered speech. Proc. Speech Prosody, 4 pages, Campinas, Brazil, May 2008.
T. Toda, K. Tokuda. Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM. Proc. IEEE ICASSP, pp. 3925-3928, Las Vegas, USA, Apr. 2008.
J. Yamagishi, T. Nose, H. Zen, T. Toda, K. Tokuda. Performance evaluation of the speaker-independent HMM-based speech synthesis system ``HTS-2007'' for the Blizzard Challenge 2007. Proc. IEEE ICASSP, pp. 3957-3960, Las Vegas, USA, Apr. 2008.
R. Maia, T. Toda, K. Tokuda, S. Sakai, S. Nakamura. On the state definition for a trainable excitation model in HMM-based speech synthesis. Proc. IEEE ICASSP, pp. 3965-3968, Las Vegas, USA, Apr. 2008.
W. Fujitsuru, H. Sekimoto, T. Toda, H. Saruwatari, K. Shikano. Bandwidth extension of cellular phone speech based on maximum likelihood estimation with GMM. Proc. NCSP, pp. 283-286, Gold Coast, Australia, Mar. 2008.
R. Gomez, T. Toda, H. Saruwatari, K. Shikano. Rapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection. Proc. INTERSPEECH, pp. 262-265, Antwerp, Belgium, Aug. 2007.
T. Cincarek, I. Shindo, T. Toda, H. Saruwatari, K. Shikano. Development of preschool children subsystem for ASR and Q&A in a real-environment speech-oriented guidance task. Proc. INTERSPEECH, pp. 1469-1472, Antwerp, Belgium, Aug. 2007.
R. Maia, T. Toda, H. Zen, Y. Nankaku, K. Tokuda. A trainable excitation model for HMM-based speech synthesis. Proc. INTERSPEECH, pp. 1909-1912, Antwerp, Belgium, Aug. 2007.
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Speaker adaptive training for one-to-many eigenvoice conversion based on Gaussian mixture model. Proc. INTERSPEECH, pp. 1981-1984, Antwerp, Belgium, Aug. 2007.
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Impact of various small sound source signals on voice conversion accuracy in speech communication aid for laryngectomees. Proc. INTERSPEECH, pp. 2517-2520, Antwerp, Belgium, Aug. 2007.
J. Ni, T. Hirai, H. Kawai, T. Toda, K. Tokuda, M. Tsuzaki, S. Sakai, R. Maia, S. Nakamura. ATRECSS - ATR English speech corpus for speech synthesis. Proc. Blizzard Challenge 2007 Workshop, 4 pages, Bonn, Germany, Aug. 2007.
J. Yamagishi, H. Zen, T. Toda, K. Tokuda. Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard Challenge 2007. Proc. Blizzard Challenge 2007 Workshop, 6 pages, Bonn, Germany, Aug. 2007.
S. Sakai, J. Ni, R. Maia, K. Tokuda, M. Tsuzaki, T. Toda, H. Kawai, S. Nakamura. Communicative speech synthesis with XIMERA. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 28-33, Bonn, Germany, Aug. 2007.
K. Ohta, Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Regression approaches to voice quality control based on one-to-many eigenvoice conversion. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 101-106, Bonn, Germany, Aug. 2007.
D. Tani, Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. An evaluation of many-to-one voice conversion algorithms with pre-stored speaker data sets. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 107-112, Bonn, Germany, Aug. 2007.
J. Yamagishi, T. Kobayashi, S. Renals, S. King, H. Zen, T. Toda, K. Tokuda. Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 125-130, Bonn, Germany, Aug. 2007.
R. Maia, T. Toda, H. Zen, Y. Nankaku, K. Tokuda. Excitation model for HMM-based speech synthesis based on residual modeling. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 131-136, Bonn, Germany, Aug. 2007.
Y. Nankaku, K. Nakamura, T. Toda, K. Tokuda. Spectral conversion based on statistical models including time-sequence matching. Proc. 6th ISCA Speech Synthesis Workshop (SSW6), pp. 333-338, Bonn, Germany, Aug. 2007.
T. Toda, Y. Ohtani, K. Shikano. One-to-many and many-to-one voice conversion based on eigenvoices. Proc. IEEE ICASSP, pp. 1249-1252, Hawaii, USA, Apr. 2007 (Invited Talk in Special Session).
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech. Proc. INTERSPEECH, pp. 1395-1398, Pittsburgh, USA, Sep. 2006.
T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Acoustic modeling for spoken dialogue systems based on unsupervised utterance-based selective training. Proc. INTERSPEECH, pp. 1722-1725, Pittsburgh, USA, Sep. 2006.
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano. Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation. Proc. INTERSPEECH, pp. 2266-2269, Pittsburgh, USA, Sep. 2006.
M. Nakagiri, T. Toda, H. Kashioka, K. Shikano. Improving body transmitted unvoiced speech with statistical voice conversion. Proc. INTERSPEECH, pp. 2270-2273, Pittsburgh, USA, Sep. 2006.
Y. Uto, Y. Nankaku, T. Toda, A. Lee, K. Tokuda. Voice conversion based on mixtures of factor analyzers. Proc. INTERSPEECH, pp. 2278-2281, Pittsburgh, USA, Sep. 2006.
T. Toda, Y. Ohtani, K. Shikano. Eigenvoice conversion based on Gaussian mixture model. Proc. INTERSPEECH, pp. 2446-2449, Pittsburgh, USA, Sep. 2006.
H. Zen, T. Toda, K. Tokuda. The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. Proc. Blizzard Challenge 2006 Workshop, 4 pages, Pittsburgh, USA, Sep. 2006.
T. Toda, H. Kawai, T. Hirai, J. Ni, N. Nishizawa, J. Yamagishi, M. Tsuzaki, K. Tokuda, S. Nakamura. Developing a test bed of English Text-to-Speech system XIMERA for the Blizzard Challenge 2006. Proc. Blizzard Challenge 2006 Workshop, 4 pages, Pittsburgh, USA, Sep. 2006.
T. Kato, T. Toda, H. Saruwatari, K. Shikano. Transcription cost reduction for constructing acoustic models using acoustic likelihood selection criteria. Proc. LREC, pp. 789-792, Genoa, Italy, May. 2006.
T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Utterance-based selective training for cost-effective task-adaptation of acoustic models. Proc. SRIV2006, pp. 71-76, Toulouse, France, May. 2006.
K. Nakamura, T. Toda, Y. Nankaku, K. Tokuda. On the use of phonetic information for mapping from articulatory movements to vocal tract spectrum. Proc. IEEE ICASSP, pp. 93-96, Toulouse, France, May. 2006.
R. Gomez, T. Toda, H. Saruwatari, K. Shikano. Improving rapid unsupervised speaker adaptation based on HMM sufficient statistics. Proc. IEEE ICASSP, pp. 1001-1004, Toulouse, France, May. 2006.
T. Cincarek, T. Toda, H. Saruwatari, K. Shikano. Selective EM training of acoustic models based on sufficient statistics of single utterances. Proc. IEEE ASRU, pp. 168-173, San Juan, Puerto Rico, Nov. 2005.
H. Zen, T. Toda. An overview of Nitech HMM-Based speech synthesis system for Blizzard Challenge 2005. Proc. INTERSPEECH, pp. 93-96, Lisbon, Portugal, Sep. 2005.
T. Toda, K. Shikano. NAM-to-speech conversion with Gaussian mixture models. Proc. INTERSPEECH, pp. 1957-1960, Lisbon, Portugal, Sep. 2005.
T. Toda, K. Tokuda. Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. Proc. INTERSPEECH, pp. 2801-2804, Lisbon, Portugal, Sep. 2005.
T. Toda, A.W. Black, K. Tokuda. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter. Proc. IEEE ICASSP, Vol. 1, pp. 9-12, Philadelphia, USA, Mar 2005.
T. Toda, A.W. Black, K. Tokuda. Acoustic-to-articulatory inversion mapping with Gaussian mixture model. Proc. INTERSPEECH, pp. 1129-1132, Jeju, Korea, Oct. 2004.
T. Toda, A.W. Black, K. Tokuda. Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis. Proc. 5th ISCA Speech Synthesis Workshop (SSW5), pp. 31-36, Pittsburgh, USA, June 2004.
H. Kawai, T. Toda, J. Ni, M. Tsuzaki, K. Tokuda. XIMERA: a new TTS from ATR based on corpus-based technologies. Proc. 5th ISCA Speech Synthesis Workshop (SSW5), pp. 179-184, Pittsburgh, USA, June 2004.
K. Adachi, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano. Perceptual evaluation of quality deterioration owing to prosody modification. Proc. LREC, pp. 2159-2162, Lisbon, Portugal, May 2004.
T. Toda, H. Kawai, M. Tsuzaki. Optimizing sub-cost functions for segment selection based on perceptual evaluations in concatenative speech synthesis. Proc. IEEE ICASSP, pp. 657-660, Montreal, Canada, May 2004.
H. Kawai, T. Toda. An evaluation of automatic phone segmentation for concatenative speech synthesis. Proc. IEEE ICASSP, pp. 677-680, Montreal, Canada, May 2004.
T. Toda, H. Kawai, M. Tsuzaki. Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations. Proc. INTERSPEECH, pp. 297-300, Geneva, Switzerland, Sep. 2003.
T. Shiraishi, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano. Simple designing methods of corpus-based visual speech synthesis. Proc. INTERSPEECH, pp. 2241-2244, Geneva, Switzerland, Sep. 2003.
H. Kawanami, Y. Iwami, T. Toda, H. Saruwatari, K. Shikano. GMM-based voice conversion applied to emotional speech synthesis. Proc. INTERSPEECH, pp. 2401-2404, Geneva, Switzerland, Sep. 2003.
T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. Segment selection considering local degradation of naturalness in concatenative speech synthesis. Proc. IEEE ICASSP, pp. 696-699, Hong Kong, Apr. 2003.
M. Mashimo, T. Toda, H. Kawanami, H. Kashioka, K. Shikano, N. Campbell. Evaluation of cross-language voice conversion using bilingual and non-bilingual databases. Proc. INTERSPEECH, pp. 293-296, Denver, USA, Sep. 2002.
H. Kawanami, T. Masuda, T. Toda, K. Shikano. Designing Japanese speech database covering wide range in prosody. Proc. INTERSPEECH, pp. 2425-2428, Denver, USA, Sep. 2002.
T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. Perceptual evaluation of cost for segment selection in concatenative speech synthesis. Proc. IEEE 2002 Workshop on Speech Synthesis, 4 pages, Santa Monica, USA, Sep. 2002.
H. Kawanami, T. Masuda, T. Toda, K. Shikano. Designing speech database with prosodic variety for expressive TTS system. Proc. LREC, pp. 2039-2042, Las Palmas, Spain, May 2002.
T. Toda, H. Kawai, M. Tsuzaki, K. Shikano. Unit selection algorithm for Japanese speech synthesis based on both phoneme unit and diphone unit. Proc. IEEE ICASSP, pp. 465-468, Orlando, USA, May 2002.
T. Toda, H. Saruwatari, K. Shikano. High quality voice conversion based on Gaussian mixture model with dynamic frequency warping. Proc. INTERSPEECH, pp. 349-352, Aalborg, Denmark, Sep. 2001.
M. Mashimo, T. Toda, K. Shikano, N. Campbell. Evaluation of cross-language voice conversion based on GMM and STRAIGHT. Proc. INTERSPEECH, pp. 361-364, Aalborg, Denmark, Sep. 2001.
T. Toda, H. Saruwatari, K. Shikano. Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT wpectrum. Proc. IEEE ICASSP, pp. 841-844, Salt Lake City, USA, May 2001.
T. Toda, J. Lu, H. Saruwatari, K. Shikano. STRAIGHT-based voice conversion algorithm based on Gaussian mixture model. Proc. INTERSPEECH, pp. 279-282, Beijing, China, Oct. 2000.
T. Toda, J. Lu, S. Nakamura, K. Shikano. Voice conversion algorithm based on Gaussian mixture model applied to STRAIGHT. Proc. WESTPRAC VII, pp. 169-172, Kumamoto, Japan, Oct. 2000.