Speech signal processing, Speech waveform generation, Voice conversion, Text-to-speech, Spoken language processing
Speech recognition, Speech emotion recognition, Spoken dialogue, Speech translation, Video retrieval
Body conducted speech processing, Articulatory-acoustic mapping, Alaryngeal speech processing, Dysarthric speech processing
Speech quality assessment, Spoofing detection, Security and privacy protection
Singing voice conversion, Singing voice synthesis, Singing-aid, Automatic music transcription, Music analysis
Source separation and extraction, Multichannel signal processing, Sound event detection, Sound event descriptions, Anomalous sound detection
Speech signal processing
S. Chen, T. Toda. QHARMA-GAN: quasi-harmonic neural vocoder based on autoregressive moving average model. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 3703-3719, Sep. 2025. [Open Access]
S. Chen, T. Toda. Sequence-wise speech waveform modeling via gradient descent optimization of quasi-harmonic parameters. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 319-332, Jan. 2025. [Open Access]
A. Miyashita, T. Toda. Differentiable representation of warping based on Lie group theory. Proc. IEEE WASPAA, 5 pages, Oct. 2023. [Preprint]
A. Miyashita, T. Toda. Representation of vocal tract length transformation based on group theory. Proc. IEEE ICASSP, 5 pages, June 2023. [Preprint]
T. Toda, K. Tokuda. Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM. Proc. IEEE ICASSP, pp. 3925-3928, Apr. 2008. [Preprint]
Speech waveform generation
R. Yoneyama, A. Miyashita, R. Yamamoto, T. Toda. Wavehax: aliasing-free neural waveform synthesis based on 2D convolution and harmonic prior for reliable complex spectrogram estimation. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33 , pp. 4454-4470, Oct. 2025. [Open Access]
R. Yoneyama, T. Toda. SiFi-GAN: combining source-filter modeling and upsampling-based high-fidelity neural vocoder for fast and pitch-controllable speech synthesis. IEICE Transactions on Information and Systems, Vol. E109-D, No. 6, pp. ***-***, June 2026. (Accepted) [Preprint]
R. Yoneyama, Y.-C. Wu, T. Toda. High-fidelity and pitch-controllable neural vocoder based on unified source-filter networks. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 3717-3729, Oct. 2023. [Open Access]
Y.-C. Wu, T. Hayashi, T. Okamoto, H. Kawai, T. Toda. Quasi-periodic parallel WaveGAN: a non-autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 792-806, Feb. 2021. [Open Access]
Voice conversion
C. Xie, T. Toda. Noisy-to-noisy voice conversion under variations of noisy condition. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 3871-3882, Oct. 2023. [Open Access]
W.-C. Huang, S.-W. Yang, T. Hayashi, T. Toda. A comparative study of self-supervised speech representation based voice conversion. IEEE Journal of Selected Topics in signal Processing, Vol. 16, No. 6, pp. 1308-1318, Oct. 2022. [Preprint]
W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, T. Toda. Pretraining techniques for sequence-to-sequence voice conversion. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 745-755, Feb. 2021. [Open Access]
T. Toda, L.-H. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, J. Yamagishi. The Voice Conversion Challenge 2016. Proc. INTERSPEECH, pp. 1632-1636, Sep. 2016. [Open Access]
T. Toda, A.W. Black, K. Tokuda. Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 8, pp. 2222-2235, Nov. 2007. [Preprint]
Text-to-speech
D. Yoshioka, Y. Nakata, Y. Yasuda, T. Toda. Text- and speech-style control for lecture speech generation focusing on disfluency. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e26, pp. 1-31, Sep. 2025. [Open Access]
T. Hayashi, R. Yamamoto, K. Inoue, T. Yoshimura, S. Watanabe, T. Toda, K. Takeda, Y. Zhang, X. Tan. ESPNET-TTS: Uunified, reproducible, and integratable open source end-to-end text-to-speech toolkit. Proc. IEEE ICASSP, pp. 7654-7658, May 2020. [Preprint]
S. Takamichi, T. Toda, A.W. Black, G. Neubig, S. Sakti, S. Nakamura. Post-filters to modify the modulation spectrum for statistical parametric speech synthesis. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No. 4, pp. 755-767, Apr. 2016. [Preprint]
K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, K. Oura. Speech synthesis based on hidden Markov models. Proceedings of the IEEE, Vol. 101, No. 5, pp. 1234-1252, May 2013. [Preprint]
T. Toda, K. Tokuda. A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions, Vol. E90-D, No. 5, pp. 816-824, May 2007. [Preprint]
H. Kawai, T. Toda, J. Ni, M. Tsuzaki, K. Tokuda. XIMERA: a new TTS from ATR based on corpus-based technologies. Proc. 5th ISCA Speech Synthesis Workshop (SSW5), pp. 179-184, Pittsburgh, USA, June 2004. [Open Access]
Spoken language processing
D. Yoshioka, Y. Yasuda, T. Toda. Nonparallel spoken-text-style transfer for linguistic expression control in speech generation. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 333-346, Jan. 2025. [Open Access]
Y. Yasuda, T. Toda. Investigation of Japanese Png BERT language model in text-to-speech synthesis for pitch accent language. IEEE Journal of Selected Topics in signal Processing, Vol. 16, No. 6, pp. 1319-1328, Oct. 2022. [Open Access]
T. Hayashi, S. Watanabe, T. Toda, K. Takeda, S. Toshniwal, K. Livescu. Pre-trained text embeddings for enhanced text-to-speech synthesis. Proc. INTERSPEECH, pp. 4430-4434, Sep. 2019. [Open Access]
Speech recognition
J. He, N. Sawada, K. Miyazaki, T. Toda. CMT-LLM: context-aware multi-talker ASR utilizing large language models. Proc. INTERSPEECH, pp. 2575-2579, Rotterdam, the Netherlands, Aug. 2025. [Open Access]
J. He, N. Sawada, K. Miyazaki, T. Toda. PARCO: phoneme-augmented robust contextual ASR via contrastive entity disambiguation. Proc. IEEE ASRU, 7 pages, Dec. 2025. [Preprint]
J. He, T. Toda. PMF-CEC: phoneme-augmented multimodal fusion for context-aware ASR error correction with error-specific selective decoding. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 2402-2417, June 2025. [Open Access]
T. Hayashi, S. Watanabe, Y. Zhang, T. Toda, T. Hori, R. Astudillo, K. Takeda. Back-translation-style data augmentation for end-to-end ASR. Proc. IEEE SLT, pp. 426-433, Dec. 2018. [Preprint]
Speech emotion recognition
J. He, X. Shi, C.-H. Hu, J. Mi, X. Li, T. Toda. M4SER: multimodal, multirepresentation, multitask, and multistrategy learning for speech emotion recognition. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 4055-4070, Sep. 2025. [Open Access]
J. He, J. Mi, T. Toda. GIA-MIC: multimodal emotion recognition with gated interactive attention and modality-invariant learning constraints. Proc. INTERSPEECH, pp. 2695-2699, Rotterdam, the Netherlands, Aug. 2025. [Open Access]
X. Shi, J. He, X. Li, T. Toda. A comprehensive study on the effectiveness of ASR representations for noise-robust speech emotion recognition. IEEE Transactions on Audio, Speech and Language Processing, Vol. ** , pp. ***-***, ***. 2026. (Accepted) [Preprint]
X. Shi, X, Li, T. Toda. Who, When, and What: leveraging the "Three Ws" concept for emotion recognition in conversation. Proc. INTERSPEECH, pp. 1763-1767, Rotterdam, the Netherlands, Aug. 2025. [Open Access]
X. Shi, X. Li, T. Toda. Emotion awareness in multi-utterance turn for improving emotion prediction in multi-speaker conversation. Proc. INTERSPEECH, pp. 765-769, Aug. 2023. [Open Access]
A. Ando, T. Mori, S. Kobashikawa, T. Toda. Speech emotion recognition based on listener-dependent emotion perception models. APSIPA Transactions on Signal and Information Processing, Vol. 10, e6, pp. 1-11, Apr. 2021. [Open Access]
Spoken dialog
T. Hiraoka, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Learning cooperative persuasive dialogue policies using framing. Speech Communication, Vol. 84, pp. 83-96, Nov. 2016. [Preprint]
Speech translation
Q. Truong Do, T. Toda, G. Neubig, S. Sakti, S. Nakamura. Preserving word-level emphasis in speech-to-speech translation. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 25, No. 3, pp. 544-556, Mar. 2017. [Preprint]
Y. Oda, G. Neubig, S. Sakti, T. Toda, S. Nakamura. Optimizing segmentation strategies for simultaneous speech translation. Proc. ACL, pp. 551-556, June 2014. [Open Access]
Video retrieval
J. He, T. Toda. 2DP-2MRC: 2-dimensional pointer-based machine reading comprehension method for multimodal moment retrieval. Proc. INTERSPEECH, pp. 5073-5077, Kos Island, Greece, Sep. 2024. [Open Access]
Body conducted speech processing
S. Seki, M. Takada, T. Toda. Semi-supervised self-produced speech enhancement and suppression based on joint source modeling of air- and body-conducted signals using variational autoencoder. Proc. INTERSPEECH, pp. 4039-4043, Oct. 2020. [Open Access]
Y. Tajiri, T. Toda. Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring. Proc. 9th ISCA Speech Synthesis Workshop (SSW9), pp. 54-60, Sep. 2016. [Open Access]
T. Toda, M. Nakagiri, K. Shikano. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, Vol. 20, No. 9, pp. 2505-2517, Sep. 2012. [Preprint]
T. Toda, K. Nakamura, T. Nagai, T. Kaino, Y. Nakajima, K. Shikano. Technologies for processing body-conducted speech detected with non-audible murmur microphone. Proc. INTERSPEECH, pp. 632-635, Sep. 2009. [Open Access]
Articulatory-acoustic mapping
P.L. Tobing, K. Kobayashi, T. Toda. Articulatory controllable speech modification based on statistical inversion and production mappings. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 12, pp. 2337-2350, Dec. 2017. [Preprint]
T. Toda, A.W. Black, K. Tokuda. Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Communication, Vol. 50, No. 3, pp. 215-227, Mar. 2008. [Preprint]
Alaryngeal speech processing
L.P. Violeta, W.-C. Huang, D. Ma, R. Yamamoto, K. Kobayashi, T. Toda. Resolving domain mismatches in electrolaryngeal speech enhancement with linguistic intermediates. IEEE Journal of Selected Topics in signal Processing,Vol. 19, No. 5, pp. 827-839, June 2025. [Open Access]
D. Ma, L.P. Violeta, K. Kobayashi, T. Toda. Pretraining and fine-tuning techniques for electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion. IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 3189-3201, July 2025. [Open Access]
L.P. Violeta, D. Ma, W.-C. Huang, T. Toda. Pretraining and adaptation techniques for electrolaryngeal speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 2777-2789, May 2024. [Open Access]
K. Kobayashi, T. Toda. Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN. Proc. EUSIPCO, pp. 396-400, Aug. 2020. [Open Access]
H. Doi, T. Toda, K. Nakamura, H. Saruwatari, K. Shikano. Alaryngeal speech enhancement based on one-to-many eigenvoice conversion. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 22, No. 1, pp. 172-183, Jan. 2014. [Preprint]
K. Nakamura, T. Toda, H. Saruwatari, K. Shikano. Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Communication, Vol. 54, No. 1, pp. 134-146, Jan. 2012. [Link]
Dysarthric speech processing
B.M. Halpern, T.B. Tienkamp, T. Rebernik, R.J.J.H. van Son, S.A.H.J. de Visscher, M.J.H. Witjes, D. Abur, T. Toda. XPPG-PCA: reference-free automatic speech severity evaluation with principal components. IEEE Journal of Selected Topics in signal Processing, Vol. 19, No. 5, pp. 783-795, Oct. 2025. [Preprint]
B.M. Halpern, W.-C. Huang, L.P. Violeta, T. Toda. Severity-controllable pathological text-to-speech synthesis for clinical applications," IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 34, pp. 573-582, Jan. 2026. [Open Access]
L.P. Violeta, W.-C. Huang, T. Toda. Investigating self-supervised pretraining frameworks for pathological speech recognition. Proc. INTERSPEECH, pp. 41-45, Incheon, Korea, Sep. 2022. [Open Access]
W.-C. Huang, K. Kobayashi, Y.-H. Peng, C.-F. Liu, Y. Tsao, H.-M. Wang, T. Toda. A preliminary study of a two-stage paradigm for preserving speaker identity in dysarthric voice conversion. Proc. INTERSPEECH, pp. 1329-1333, Aug.-Sep. 2021. [Open Access]
Speech quality assessment
Y. Yasuda, T. Toda. Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment. Computer Speech and Language, Vol. 96, Article 101888, pp. 1-16, Sep. 2025. [Open Access]
C.-H. Hu, Y. Yasuda, T. Toda. E2EPref: an end-to-end preference-based framework for speech quality assessment to alleviate bias in direct assessment scores. Computer Speech and Language, Vol. 93, Article 101799, pp. 1-17, Mar. 2025. [Open Access]
E. Cooper, W.-C. Huang, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. A review on subjective and objective evaluation of synthetic speech. Acoustical Science and Technology,Vol. 45, No. 4, pp. 161-183, July 2024. [Open Access]
W.-C. Huang, E. Cooper, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi. The VoiceMOS Challenge 2022. Proc. INTERSPEECH, pp. 4536-4540, Incheon, Korea, Sep. 2022. [Open Access]
W.-C. Huang, E. Cooper, J. Yamagishi, T. Toda. LDNet: unified listener dependent modeling in MOS prediction for synthetic speech. Proc. IEEE ICASSP, pp. 896-900, May 2022. [Preprint]
E. Cooper, W.-C. Huang, T. Toda, J. Yamagishi. Generalization ability of MOS prediction networks. Proc. IEEE ICASSP, pp. 8442-8446, May 2022. [Preprint]
Spoofing detection
Y. Zang, J. Shi, Y. Zhang, R. Yamamoto, J. Han, Y. Tang, S. Xu, W. Zhao, J. Guo, T. Toda, Z. Duan CtrSVDD: a benchmark dataset and baseline analysis for controlled singing voice deepfake detection. Proc. INTERSPEECH, pp. 4783-4787, Kos Island, Greece, Sep. 2024. [Open Access]
X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V. Vestman, T. Kinnunen, K.A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. Wang, S. Le Maguer, M. Becker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Onuma, K. Mushika, T. Kaneda, Y. Jiang, L.-J. Liu, Y.-C. Wu, W.-C. Huang, T. Toda, K. Tanaka, H. Kameoka, I. Steiner, D. Matrouf, J.-F. Bonastre, A. Govender, S. Ronanki, J.-X. Zhang, Z.-H. Ling. ASVspoof 2019: a large-scale public database of synthetic, converted and replayed speech. Computer Speech and Language, Vol. 64, Article 101114, 25 pages, Nov. 2020. [Preprint]
T. Kinnunen, J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, Z. Ling. A spoofing benchmark for the 2018 voice conversion challenge: leveraging from spoofing countermeasures for speech artifact assessment. Proc. Odyssey 2018, pp. 187-194, June 2018. [Open Access]
Z. Wu, P. De Leon, C. Demiroglu, A. Khodabakhsh, S. King, Z.-H. Ling, D. Saito, B. Stewart, T. Toda, M. Wester, J. Yamagishi. Anti-spoofing for text-independent speaker verification: an initial database, comparison of countermeasures, and human performance. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No. 4, pp. 768-783, Apr. 2016. [Preprint]
Security and privacy protection
L. Chen, K.A. Lee, Z.-H. Ling, X. Wang, R.K. Das, T. Toda, H. Li. Speaker privacy and security in the big data era: protection and defense against deepfake. Proc. APSIPA ASC, Perspective paper, pp. 2570-2575, Singapore, Oct. 2025. [Open Access]
S. Tang, Z. Liu, L. Chen, K. Lee, T. Toda, Z.-H. Ling. A preliminary study on sectional voice anonymization and detection. Proc. APSIPA ASC, pp. 2229-2234, Singapore, Oct. 2025. [Open Access]
D. Yoon, T. Toda. Neural semi-fragile watermarking for proactive deepfake speech detection. Proc. APSIPA ASC, pp. 2396-2401, Singapore, Oct. 2025. [Open Access]
W.-C. Huang, Y.-C. Wu, T. Toda. Multi-speaker text-to-speech training with speaker anonymized data. IEEE Signal Processing Letters, Vol. 31, pp. 2995-2999, Oct. 2024. [Open Access]
Singing voice conversion
W.-C. Huang, L.P. Violeta, S. Liu, J. Shi, T. Toda. The Singing Voice Conversion Challenge 2023. Proc. IEEE ASRU, 8 pages, Dec. 2023. [Preprint]
R. Yamamoto, R. Yoneyama, L.P. Violeta, W.-C. Huang, T. Toda. A comparative study of voice conversion models with large-scale speech and singing data: the T13 systems for the Singing Voice Conversion Challenge 2023. Proc. IEEE ASRU, 6 pages, Taipei, Taiwan, Dec. 2023. [Preprint]
K. Kobayashi, T. Toda, S. Nakamura. Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential. Speech Communication, Vol. 99, pp. 211-220, May 2018. [Open Access]
K. Kobayashi, T. Toda, H. Doi, T. Nakano, M. Goto, G. Neubig, S. Sakti, S. Nakamura. Voice timbre control based on perceived age in singing voice conversion. IEICE Transactions on Information and Systems, Vol. E97-D, No. 6, pp. 1419-1428, June 2014. [Open Access]
H. Doi, T. Toda, T. Nakano, M. Goto, S. Nakamura. Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system. Proc. APSIPA ASC, Nov. 2012. [Open Access]
Singing voice synthesis
K. Nishizawa, R. Yamamoto, W.-C. Huang, T. Toda. Investigating factors related to the naturalness of synthesized unison singing. Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025. [Preprint]
R. Yamamoto, R. Yoneyama, T. Toda. NNSVS: a neural network based singing voice synthesis toolkit. Proc. IEEE ICASSP, 5 pages, June 2023. [Preprint]
Singing-aid
L. Li, T. Toda, K. Morikawa, K. Kobayashi, S. Makino. Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE. Proc. ISMIR, pp. 784-790, Delft, the Netherlands, Nov. 2019. [Open Access]
K. Morikawa, T. Toda. Electrolaryngeal speech modification towards singing aid system for laryngectomees. Proc. APSIPA, 4 pages, Dec. 2017. [Open Access]
Automatic music transcription
J. Mi, S. Kim, T. Toda. Improved architecture for high-resolution piano transcription to efficiently capture acoustic characteristics of music signals. Proc. APSIPA ASC, 6 pages, Macau, China, Dec. 2024. [Preprint]
S. Kim, K. Takeda, T. Toda. Sequence-to-sequence network training methods for automatic guitar transcription with tokenized outputs. Proc. ISMIR, pp. 524-531, Nov. 2023. [Open Access]
S. Kim, T. Hayashi, T. Toda. Note-level automatic guitar transcription using attention mechanism. Proc. EUSIPCO, pp. 229-233, Aug.-Sep. 2022. [Open Access]
Music analysis
T. Imamura, Y. Hashizume, W.-C. Huang, T. Toda. Music similarity representation learning focusing on individual instruments with source separation and human preference. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 4, e305, pp. 1-29, Oct. 2025. [Open Access]
Y. Hashizume, L. Li, A. Miyashita, T. Toda. Learning separated representations for instrument-based music similarity. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e16, pp. 1-32, July 2025. [Open Access]
S. Seki, T. Toda, K. Takeda. Stereophonic music separation based on non-negative tensor factorization with cepstral distance regularization. IEICE Transactions on Fundamentals, Vol. E101-A, No. 7, pp. 1057-1064, July 2018. [Link]
Source separation and extraction
T. Fujimura, T. Toda. Analysis and extension of noisy-target training for unsupervised target signal enhancement. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e12, pp. 1-27, June 2025. [Open Access]
R. Wang, T. Fujimura, T. Toda. Target speaker extraction under noisy underdetermined conditions using conditional variational autoencoder, global style token, and neural postfilter. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e2, pp. 1-26, Jan. 2025. [Open Access]
R. Wang, L. Li, T. Toda. Dual-channel target speaker extraction based on conditional variational autoencoder and directional information. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 1968-1979, Mar. 2024. [Open Access]
S. Seki, H. Kameoka, L. Li, T. Toda, K. Takeda. Underdetermined source separation based on generalized multichannel variational autoencoder. IEEE Access, Vol. 7, No. 1, pp. 168104-168115, Nov. 2019. [Open Access]
Multichannel signal processing
S. Luan, Y. Wakabayashi, T. Toda. Generalized sound field interpolation for freely spaced microphone arrays in rotation-robust beamforming. Applied Acoustics, Vol. 236, Article 110706, pp. 1-15, Apr. 2025. [Open Access]
S. Luan, Y. Wakabayashi, T. Toda. Unequally spaced sound field interpolation for rotation-robust beamforming. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 3185-3199, June 2024. [Open Access]
H. Maki, T. Toda, S. Sakti, G. Neubig, S. Nakamura. Enhancing event-related potentials based on maximum a posteriori estimation with a spatial correlation prior. IEICE Transactions on Information and Systems, Vol. E99-D, No. 6, pp. 1410-1419, June 2016. [Open Access]
Sound event detection
K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda. Weakly-supervised sound event detection with self-attention. Proc. IEEE ICASSP, pp. 66-70, May 2020. [Link]
T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda. Duration-controlled LSTM for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 11, pp. 2059-2070, Nov. 2017. [Open Access]
Audio captioning and sound event descriptions
T. Komatsu, K. Takeda, T. Toda. Audio difference learning framework for audio captioning. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e34, pp. 1-18, Nov. 2025. [Open Access]
K. Miyazaki, T. Hayashi, T. Toda, K. Takeda. Connectionist temporal classification-based sound event encoder for converting sound events into onomatopoeia representations. Proc. EUSIPCO, pp. 857-861, Sep. 2018. [Open Access]
Anomalous sound detection
K. Wilkinghoff, T. Fujimura, K. Imoto, J. Le Roux, Z.-H. Tan, T. Toda. Handling domain shifts for anomalous sound detection: a review of DCASE-related work. Proc. DCASE Workshop, pp. 20-24, Barcelona, Spain, Oct. 2025. [Open Access]
T. Fujimura, K. Wilkinghoff, K. Imoto, T. Toda. ASDKit: a toolkit for comprehensive evaluation of anomalous sound detection methods. Proc. DCASE Workshop, pp. 40-44, Barcelona, Spain, Oct. 2025. [Open Access]
I. Kuroyanagi, T. Fujimura, K. Takeda, T. Toda. Improving anomalous sound detection through pseudo-anomalous set selection and pseudo-label utilization under unlabeled conditions. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e13, pp. 1-28, June 2025. [Open Access]
I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda. Serial-OE: Anomalous sound detection based on serial method with outlier exposure capable of using small amounts of anomalous data for training. APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e1, pp. 1-32, Jan. 2025. [Open Access]
T. Hayashi, T. Komatsu, R. Kondo, T. Toda, K. Takeda. Anomalous sound event detection based on WaveNet. Proc. EUSIPCO, pp. 2508-2512, Sep. 2018. [Open Access]
Graduation Thesis [Itakura Lab. (Acoustic & Speech Processing Group)]
``Improvement of STRAIGHT Analysis-Synthesis Method under Noisy Conditions'' (in Japanese)
Master's Thesis [Shikano Lab. (Speech and Acoustics Laboratory)]
``High Quality Voice Conversion Based on STRAIGHT Analysis-synthesis Method'' (in Japanese)
Doctoral Thesis [Shikano Lab.]