Yusuke Fujita
Manager at Music Processing Team, LY Corporation (formerly LINE), Tokyo, Japan
Research Scientist at SB Intuitions, Tokyo, Japan
yusuke.fujita _at_ ieee.org
yusuke.fujita _at_ lycorp.co.jp
yusuke.fujita _at_ sbintuitions.co.jp
Ph.D. thesis
Yusuke Fujita, "A Study on Speaker Diarization based on End-to-end Optimization," Waseda University, 2024 [https://waseda.repo.nii.ac.jp/records/2002444]
Tutorial talk
Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda, Shinji Watanabe, "T-9 - Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization," ICASSP, 2021
Yusuke Fujita, Tatsuya Komatsu, "Audio Fingerprinting with Holographic Reduced Representations," Proc. Interspeech 2024
Robin Scheibler, Yusuke Fujita, Yuma Shirahata, Tatsuya Komatsu, "Universal Score-based Speech Enhancement with High Content Preservation," Proc. Interspeech 2024
Hokuto Munakata, Ryo Terashima, Yusuke Fujita, "Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework," Proc. Interspeech 2024
Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda, "Audio Difference Learning for Audio Captioning," Proc. ICASSP 2024
Michael Hentschel, Yuta Nishikawa, Tatsuya Komatsu, Yusuke Fujita, "Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers," Proc. ICASSP 2024
Yusuke Fujita, Tetsuji Ogawa, Tetsunori Kobayashi, "Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization," IEEE Access, 2023
Robin Scheibler, Takuya Hasumi, Yusuke Fujita, Tatsuya Komatsu, Ryuichi Yamamoto, Kentaro Tachibana, "Foley Sound Synthesis with a Class-Conditioned Latent Diffusion Model," Proc. DCASE 2023 Workshop
Aoi Ito, Tatsuya Komatsu, Yusuke Fujita, Yusuke Kida, "Target Vocabulary Recognition Based on Multi-Task Learning with Decomposed Teacher Sequences," Proc. Interspeech 2023
Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa, "Neural Diarization with Non-Autoregressive Intermediate Attractors, " Proc. ICASSP 2023
Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida, "Alternate Intermediate Conditioning with Syllable-Level and Character-Level Targets for Japanese ASR," Proc. SLT 2022
Tatsuya Komatsu, Yusuke Fujita, "Interdecoder: using Attention Decoders as Intermediate Regularization for CTC-Based Speech Recognition," Proc. SLT 2022
Robin Scheibler, Tatsuya Komatsu, Yusuke Fujita, Michael Hentschel, "On Sorting and Padding Multiple Targets for Sound Event Localization and Detection with Permutation Invariant and Location-based Training," APSIPA ASC 2022.
Robin Scheibler, Tatsuya Komatsu, Yusuke Fujita, Michael Hentschel, "Sound event localization and detection with pre-trained audio spectrogram transformer and multichannel separation network," Proc. DCASE 2022 Workshop
Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida, "Better Intermediates Improve CTC Inference," Proc. Interspeech 2022
Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida, "InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR," Proc. Interspeech 2022
Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Paola Garcia, "Encoder-decoder based attractors for end-to-end neural diarization," IEEE/ACM Transaction on Audio, Speech, and Language Processing, 2022
Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Leibny Paola Garcia Perera, Kenji Namagatsu, "Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers," Proc. Interspeech 2021
Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Leibny Paola Garcia Perera and Kenji Nagamatsu, "Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization," Proc. Interspeech 2021
Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu, "End-to-End Speaker Diarization as Post-Processing," Proc. ICASSP 2021
Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur, "The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap," DIHARD III workshop.
Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola Garcia, Kenji Nagamatsu, “End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection,” Proc. SLT 2021
Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Paola Garcia, Kenji Nagamatsu, “Online End-to-End Neural Diarization with Speaker-Tracing Buffer,” Proc. SLT 2021
Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu, “Block-Online Guided Source Separation,” Proc. SLT 2021
Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie, “Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals,” Proc. NeurIPS 2020
Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu, “End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors,” Proc. Interspeech 2020
Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu, “Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones,” Proc. Interspeech 2020
Jing Shi, Jiaming Xu, Yusuke Fujita, Shinji Watanabe, Bo Xu, “Speaker-Conditional Chain Model for Speech Separation and Extraction,” Proc Interspeech 2020
Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur, "Speaker Diarization with Region Proposal Network," Proc. ICASSP, 2020
Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe, "End-to-End Neural Speaker Diarization with Self-attention," Proc. ASRU, pp. 296-303, 2019 (Best Paper Finalist)
Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe, "Simultaneous speech recognition and speaker diarization for monaural dialogue recordings with target-speaker acoustic models," Proc. ASRU, pp. 31-38, 2019
Matthew Maciejewski, Gregory Sell, Yusuke Fujita, Leibny Paola Garcia-Perera, Shinji Watanabe, Sanjeev Khudanpur, "Analysis of robustness of deep single-channel speech separation using corpora constructed from multiple domains," Proc. WASPAA, pp. 165-169, 2019
Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu and Shinji Watanabe, "End-to-End Neural Speaker Diarization with Permutation-Free Objectives," Proc. Interspeech, pp. 4300-4304, 2019
Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu and Shinji Watanabe, "Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition," Proc. Interspeech, pp. 236-240, 2019
Naoyuki Kanda, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach, "Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR," Proc. Interspeech, pp. 1248-1252, 2019
Naoyuki Kanda, Yusuke Fujita, Shota Horiguchi, Rintaro Ikeshita, Kenji Nagamatsu, Shinji Watanabe, "Acoustic Modeling for Distant Multi-talker Speech Recognition with Single-and Multi-channel Branches", Proc. ICASSP, pp. 6630-6634, 2019
Vimal Manohar, Szu-Jui Chen, Zhiqi Wang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur, "Acoustic Modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System", Proc. ICASSP, pp.6665-6669, 2019
Naoyuki Kanda, Rintaro Ikeshita, Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu, Xiaofei Wang, Vimal Manohar, Nelson Enrique Yalta Soplin, Matthew Maciejewski, Szu-Jui Chen, Aswin Shanmugam Subramanian, Ruizhi Li, Zhiqi Wang, Jason Naradowsky, L. Paola Garcia-Perera, Gregory Sell, "The Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays" , Proc. CHiME 2018
Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu, "Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models", Proc. INTERSPEECH, 2018
Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu, "Sequence distillation for purely sequence trained acoustic models", Proc. ICASSP, pp. 5964-5968, 2018
Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu, "Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence.", ASRU 2017, pp. 69-76, 2017
Rintaro Ikeshita, Yohei Kawaguchi, Masahito Togami, Yusuke Fujita, Kenji Nagamatsu, "Independent vector analysis with frequency range division and prior switching.", EUSIPCO 2017: pp. 2329-2333, 2017
Rintaro Ikeshita, Masahito Togami, Yohei Kawaguchi, Yusuke Fujita, Kenji Nagamatsu, Local Gaussian model with source-set constraints in audio source separation. MLSP2017: 1-6
Yusuke Fujita, Takeshi Homma, Masahito Togami, "Unsupervised network adaptation and phonetically-oriented system combination for the CHiME-4 challenge,'' Proc. CHiME 2016, pp. 49–51, 2016.
Yusuke Fujita, Ryoichi Takashima, Takeshi Homma, Masahito Togami, "Data Augmentation Using Multi-Input Multi-Output Source Separation for Deep Neural Network Based Acoustic Modeling," Proc. Interspeech 2016, pp. 3818-3822, 2016
Masahito Togami, Ryoichi Takashima, Yusuke Fujita, "Solving permutation problem with a cascade combination of phase difference entropy and power spectral correlation," IWAENC2016, 2016
Yusuke Fujita, Ryoichi Takashima, Takeshi Homma, Rintaro Ikeshita, Yohei Kawaguchi, Takashi Sumiyoshi, Takashi Endo, Masahito Togami, "Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection," Proc. ASRU 2015, pp. 416-422, 2015
Yasushi Yamazaki, Yusuke Fujita, Naohisa Komatsu, "CELP-based speaker verification: an evaluation under noisy conditions," Proc. ICARCV 2004, pp. 408-412, vol. 1, 2004
Session chair
Interspeech 2024
CHiME5: The 5th International Workshop on Speech Processing in Everyday Environments, 2018
Reviewer
Journal
IEEE Transactions on Audio, Speech and Language Processing
Computer Speech and Language
Speech Communication
Digital Signal Processing
EURASIP Journal on Audio, Speech, and Music Processing
Conference
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
ISCA Interspeech
IEEE Spoken Language Technology Workshop (SLT)
IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
International Joint Conferences on Artificial Intelligence (IJCAI)
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
AAAI Conference on Artificial Intelligence (AAAI)
IEEE International Workshop on Multimedia Signal Processing (MMSP)
European Signal Processing Conference (EUSIPCO)
Education
Doctor of Engineering from Waseda University, March 2024
Thesis: A Study on Speaker Diarization based on End-to-end Optimization
Supervisor: Prof. Tetsunori Kobayashi
Master of Engineering, Computer Science from Waseda University, Tokyo, Japan, March 2005
Thesis: Improvement of CELP-based Speaker Recognition
Supervisor: Prof. Naohisa Komatsu
Bachelor of Engineering, Electronics/Information/Communication from Waseda University, Tokyo, Japan, March 2003
Thesis: CELP-based Speaker Recognition
Supervisor: Prof. Naohisa Komatsu