出版論文
Y. Sudo, M. Shakeel, Y. Fukumoto, B. Yan, J. Shi, Y. Peng, and S. Watanabe, “Joint Beam Search Integrating CTC, Attention, and Transducer Decoders”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025, 33, pp. 598-612. [arXiv]
周藤 唯, Muhammad Shakeel, 住田 直亮, “再学習を必要としないカスタマイズ可能なEnd-to-end音声認識の研究”, Honda R&D Technical Review, 2025.
Y. Sudo, M. Takigahira, H. Tsuru, K. Nakadai, and H. Nakajima, “Online Adaptation of Fourier Series-Based Acoustic Transfer Function Model and Its Application to Sound Source Localization and Separation”, Journal of Advanced Robotics, 2024, 38 (19-20), pp. 1351–1363. [Link]
大崎 崇博, 周藤 唯, 糸山 克寿, 西田 健次, 中臺 一博, “音声強調ネットワークとアダプターを用いた音声認識の耐雑音ロバスト性向上 ”, ロボット学会誌, 2024, 42, 9, pp. 920-923.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, "Multichannel Environmental Sound Segmentation with Separately Trained Spectral and Spatial Features", Journal of Applied Intelligence, 2021, 51, pp. 8245–8259. [Link]
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, "Sound event aware environmental sound segmentation with Mask U-Net", Journal of Advanced Robotics, 2020, 34 (20), pp. 1280–1290. [Link]
Y. Kakinuma, Y. Sudo, and T. Aoyama, “Detection of Chatter Vibration in End Milling applying Disturbance Observer”, CIRP Annals-Manufacturing Technology, 60, 1, 2011, pp. 109-112. [Link]
周藤 唯, 柿沼 康弘, 大西 公平, 青山 藤詞郎, "エンドミル加工における外乱オブザーバを用いたセンサレスびびり振動検出技術の開発(第1報): 平均計時法を用いた高精度プロセスモニタリング", 精密工学会誌, 77, 7, 2011, pp. 707-712. (🏆精密工学会 研究奨励賞🏆) [J-STAGE]
T. Mizumoto, Y. Fujita, H. Shi, L. Liu, A. Kojima, and Y. Sudo, “Evaluating Japanese Dialect Robustness across Speech and Text-based Large Language Models”, in Proc. ASRU, 2025.
H. Shi, Y. Fujita, T. Mizumoto, L. Liu, A. Kojima, and Y. Sudo, “Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition”, in Proc. ASRU, 2025.
M. Shakeel, Y. Sudo, Y. Peng, C. J. Lin, and S. Watanabe, “Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder”, in Proc. ASRU, 2025.
A. Kojima, Y. Fujita, H. Shi, T. Mizumoto, M. Zhao, and Y. Sudo, “Conversation Context-aware Direct Preference Optimization for Style-Controlled Speech Synthesis”, in Proc. APSIPA, 2025.
Y. Sudo, Y. Fujita, A. Kojima, T. Mizumoto, and L. Liu, "OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary", in Proc. INTERSPEECH, 2025. [arXiv]
Y. Fujita, T. Mizumoto, A. Kojima, L. Liu, and Y. Sudo, "AC/DC: LLM-based Audio Comprehension via Dialogue Continuation", in Proc. INTERSPEECH, 2025.
T. Mizumoto, A. Kojima, Y. Fujita, L. Liu, and Y. Sudo, "Is Synthetic Data Truly Effective for Training Speech Language Models?", in Proc. INTERSPEECH, 2025.
Y. Peng, S. Muhammad, Y. Sudo, W. Chen, J. Tian, J. Lin, and S. Watanabe, "OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning", in Proc. INTERSPEECH, 2025. (🏆ISCA Best Student Paper Award 2025🏆️)
Y. Sudo, Y. Fukumoto, M. Shakeel, Y. Peng, C. J. Lin, and S. Watanabe, "DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition", in Proc. INTERSPEECH, 2025. [arXiv]
C. Maeda, S. Muhammad, and Y. Sudo, "Joint Target-Speaker ASR and Activity Detection", in Proc. INTERSPEECH, 2025.
Y. Sudo, Y. Fukumoto, M. Shakeel, Y. Peng, and S. Watanabe, “Contextualized Automatic Speech Recognition with Dynamic Vocabulary”, in Proc. SLT, 2024. (🏆IEEE SLT 2024 Best Paper Award🏆) [arXiv]
Y. Peng, Y. Sudo, M. Shakeel, and S. Watanabe, “OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification”, in Proc. ACL, 2024, pp. 10192-10209. [arXiv]
Y. Peng, J. Tian, W. Chen, S. Arora, B. Yan, Y. Sudo, S. Muhammad, K. Choi, J. Shi, X. Chan, J. Jung, and S. Watanabe, “OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer”, in Proc. INTEERSPEECH, 2024, pp. 352-356.
M. Shakeel, Y. Sudo, Y. Peng, and S. Watanabe, “Contextualized End-to-End Automatic Speech Recognition with Intermediate Biasing Loss”, in Proc. INTEERSPEECH, 2024, pp. 3909-3913.
T. Osaki, Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, "Improving Noise Robustness of Automatic Speech Recognition based on a Parallel Adapter Model with Near-Identity Initialization", in Proc. IEA/AIE, 2024.
Y. Sudo, M. Shakeel, Y. Fukumoto, Y. Peng, and S. Watanabe, “Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search”, in Proc. ICASSP, 2024, pp. 10896-10900. [arXiv]
M. Shakeel, Y. Sudo, Y. Peng, and S. Watanabe, “Joint Optimization of Streaming and Non-streaming Automatic Speech Recognition with Multi-decoder and Knowledge Distillation”, satellite workshop HSCMA in ICASSP, 2024.
Y. Peng, J. Tian, B. Yan, D. Berrebbi, X. Chang, X. Li, J. Shi, S. Arora, W. Chen, R. Sharma, W. Zhang, Y. Sudo, M. Shakeel, J. Jung, S. Maiti, S. Watanabe, "Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data”, in Proc. ASRU, 2023.
R. Takeda, Y. Sudo, and T. Komatani, “Flexible Evidence Model to Reduce Uncertainty Mismatch Between Speech Enhancement and ASR Based on Encoder-Decoder Architecture”, in Proc. APSIPA, 2023.
Y. Long, R. Yasuda, Y. Sudo, K. Itoyama, K. Nakadai, Hideharu Amano, and Kenji Nishida, “Sound event localization and detection utilizing overlapping end-to-end learning”, Proceedings of Asia Pacific Conference on Robot IoT System Development and Platform (APRIS), 2023.
Y. Sudo, M. Takigahira, H. Tsuru, K. Nakadai, and H. Nakajima, “Online Adaptation of Fourier Series Based Acoustic Transfer Function Model to Improve Sound Source Localization and Separation”, in Proc. RO-MAN, 2023.
Y. Sudo, M. Shakeel, B. Yan, J. Shi, and S. Watanabe, “4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders”, in Proc. INTERSPEECH, 2023, pp. 3312-3316. [arXiv]
Y. Sudo, M. Shakeel, Y. Peng, and S. Watanabe, “Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training”, in Proc. INTERSPEECH, 2023, pp. 4479-4483. [ISCA Archive]
Y. Sudo, K. Hata, and K. Nakadai, “Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation”, in Proc. INTERSPEECH, 2023, pp. 491-495. [arXiv]
Y. Peng, Y. Sudo, M. Shakeel, and S. Watanabe, “DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models”, in Proc. INTERSPEECH, 2023, pp. 62-66.
Y. Sudo, M. Shakeel, K. Nakadai, J. Shi and S. Watanabe, “Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection”, in Proc. INTERSPEECH, 2022, pp. 4641-4645. [ISCA Archive]
R. Takeda, Y. Sudo, K. Nakadai, and T. Komatani, “Empirical Sampling from Latent Utterance-wise Evidence Model for Missing Data ASR based on Neural Encoder-Decoder Model”, in Proc. INTERSPEECH, 2022, pp. 3789-3793.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, “Multi-channel Environmental sound segmentation utilizing Sound Source Localization and Separation U-Net”, in Proc. SII, 2021, pp. 382-387.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, “Multi-channel Environmental sound segmentation”, in Proc. SII, 2020, pp. 820-825.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, “Environmental sound segmentation utilizing Mask U-Net”, in Proc. IROS, 2019, pp. 5340-5345.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, “Improvement of DOA estimation by using quaternion output in sound event localization and detection”, in Proc. DCASE, 2019, pp. 244-247.
M. Iwatsuki, Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, "Listen and Tell: Acoustic Scene Caption Generation using Deep Learning", in The Third International Workshop on Symbolic-Neural Learning, 2019.
Y. Sudo, Y. Kakinuma, K. Ohnishi, and T. Aoyama, "Development of Chatter Vibration Detecting System utilizing Sensor-less Process Monitoring", in Proc. of 43rd CIRP International Conference on Manufacturing Systems, Vienna, Austria, May 26-28, 2010, pp. 551-554.
周藤 唯, Muhammad Shakeel, Peng Yifan, 渡部 晋治, “動的な語彙拡張を用いたEnd-to-end音声認識の文脈適応”, 第66回人工知能学会 AIチャレンジ研究会予稿集, 2024, pp. 24-31. [Link]
大崎 崇博, 周藤 唯, 中臺 一博, “音声強調と雑音特徴量を用いた音声認識の雑音耐性向上”, 第66回人工知能学会 AIチャレンジ研究会予稿集, 2024, pp. 1-7.
M. Ngai, C, Maeda, M. Shakeel, and Y. Sudo, “Speech Separation with Auxiliary Signal-to-Artifact Ratio Loss for Improving Multi-Talker ASR”, 第66回人工知能学会 AIチャレンジ研究会予稿集, 2024, pp. 8-15.
大崎 崇博, 周藤 唯, 糸山 克寿, 中臺 一博, “Biasing networkを用いた音声認識の雑音耐性向上”, 第42回日本ロボット学会学術講演会予稿集, 2024.
Y. Long, Y. Sudo, M, Shakeel, K. Itoyama, K. Nakadai, “A Multi-Form Language Speech Translation Model Based on ESPnet”, 第42回日本ロボット学会学術講演会予稿集, 2024.
大崎 崇博, 周藤 唯, 糸山 克寿, 西田 健次, 中臺 一博, “Parallel Adapter ModelとNear-Identity初期化を用いた音声認識の雑音耐性向上”, 第63回人工知能学会 AIチャレンジ研究会予稿集, 2023, pp. 2-8 (🏆人工知能学会 研究会優秀賞🏆).
周藤 唯, 瀧ケ平 将行 , 中臺 一博, 中島 弘史 , “フーリエ級数展開を用いた軽量伝達関数のオンライン適応による音源定位・分離の向上”, 第63回人工知能学会 AIチャレンジ研究会予稿集, 2023, pp. 39-46. [Link]
M. Shakeel, Y. Sudo, Y. Peng, S. Watanabe, “End-to-end integration of online and offline encoders using auxiliary losses for automatic speech recognition”, 第62回人工知能学会 AIチャレンジ研究会予稿集, 2023, pp. 9-14.
大崎 崇博, 周藤 唯, 糸山 克寿, 西田 健次, 中臺 一博, “音声強調ネットワークとアダプター層を用いた音声認識モデルの耐ノイズ性向上”, 第41回日本ロボット学会学術講演会予稿集, 2023.
Y. Long, R. Yasuda, Y. Sudo, K. Itoyama, K. Nakadai, H. Amano, and K. Nishida, “An efficient end-to-end learning method for sound event localization and detection”, 第41回日本ロボット学会学術講演会予稿集, 2023.
周藤 唯, Muhammad Shakeel, 中臺 一博, 史 嘉彤, 渡部 晋治, “Blockwiseストリーミング音声認識と発話区間検出の統合”, 第61回人工知能学会 AIチャレンジ研究会予稿集, 2022, pp.51-56. [Link]
岩月 道生, 周藤 唯, 糸山 克寿, 西田 健次, 中臺 一博, “音環境説明ロボットの実現に向けた環境音キャプションコーパスの構築”, 第37回日本ロボット学会学術講演会予稿集, 2019.
岩月 道生, 周藤 唯, 糸山 克寿, 西田 健次, 中臺 一博, “Listen and Tell: 深層学習を用いた音響シーンのキャプション生成”, 第81回情報処理学会全国大会講演論文集, 2019, pp. 407-408.
周藤 唯, 糸山 克寿, 西田 健次, 中臺 一博, “Mask U-Net を用いた環境音セグメンテーションの提案”, 第52回人工知能学会 AIチャレンジ研究会予稿集, 2018, pp. 21-26.
周藤 唯, 柿沼 康弘, 大西 公平, 青山 藤詞郎, “エンドミル加工における外乱オブザーバを用いたセンサレスびびり振動検出技術の開発”, 精密工学会春季大会学術講演会, 2011, pp.333-334.
三又 昭範, 周藤 唯, 柿沼 康弘, 青山 藤詞郎, 柳川 章全, "工作機械主軸用ラビリンスシールへの油滴の浸入メカニズムの数値解析", 精密工学会学術講演会講演論文集, 2009, pp. 1049-1050.
“進化するヒトと機械の音声コミュニケーション Vol.2”, “2編1章3節 聴覚障がい者向け音声認識システムの開発”, エヌ・ティー・エス, 2025.
Deep-Learning-Based Environmental Sound Segmentation - Integration of Sound Source Localization, Separation, and Classification -, Ph.D. Dissertation, Tokyo Institute of Technology, Japan, March 2021.
エンドミル加工における外乱オブザーバを用いたセンサレスびびり振動検出技術の開発とその抑制, 修士論文, 慶應義塾大学 大学院 理工学研究科, 2011年3月.
工作機械主軸用ラビリンスシールの油滴浸透メカニズムの数値解析, 学士論文, 慶應義塾大学, 2009年3月.