Yui Sudo - Publication-ja

Yui Sudo

出版論文

English

論文誌

Y. Sudo, M. Shakeel, Y. Fukumoto, B. Yan, J. Shi, Y. Peng, and S. Watanabe, “Joint Beam Search Integrating CTC, Attention, and Transducer Decoders”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025, 33, pp. 598-612. [arXiv]
周藤唯, Muhammad Shakeel, 住田直亮, “再学習を必要としないカスタマイズ可能なEnd-to-end音声認識の研究”, Honda R&D Technical Review, 2025.
Y. Sudo, M. Takigahira, H. Tsuru, K. Nakadai, and H. Nakajima, “Online Adaptation of Fourier Series-Based Acoustic Transfer Function Model and Its Application to Sound Source Localization and Separation”, Journal of Advanced Robotics, 2024, 38 (19-20), pp. 1351–1363. [Link]
大崎崇博, 周藤唯, 糸山克寿, 西田健次, 中臺一博, “音声強調ネットワークとアダプターを用いた音声認識の耐雑音ロバスト性向上 ”, ロボット学会誌, 2024, 42, 9, pp. 920-923.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, "Multichannel Environmental Sound Segmentation with Separately Trained Spectral and Spatial Features", Journal of Applied Intelligence, 2021, 51, pp. 8245–8259. [Link]
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, "Sound event aware environmental sound segmentation with Mask U-Net", Journal of Advanced Robotics, 2020, 34 (20), pp. 1280–1290. [Link]
Y. Kakinuma, Y. Sudo, and T. Aoyama, “Detection of Chatter Vibration in End Milling applying Disturbance Observer”, CIRP Annals-Manufacturing Technology, 60, 1, 2011, pp. 109-112. [Link]
周藤唯, 柿沼康弘, 大西公平, 青山藤詞郎, "エンドミル加工における外乱オブザーバを用いたセンサレスびびり振動検出技術の開発(第1報): 平均計時法を用いた高精度プロセスモニタリング", 精密工学会誌, 77, 7, 2011, pp. 707-712. (🏆精密工学会研究奨励賞🏆) [J-STAGE]

査読付き国際学会

T. Mizumoto, Y. Fujita, H. Shi, L. Liu, A. Kojima, and Y. Sudo, “Evaluating Japanese Dialect Robustness across Speech and Text-based Large Language Models”, in Proc. ASRU, 2025.
H. Shi, Y. Fujita, T. Mizumoto, L. Liu, A. Kojima, and Y. Sudo, “Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition”, in Proc. ASRU, 2025.
M. Shakeel, Y. Sudo, Y. Peng, C. J. Lin, and S. Watanabe, “Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder”, in Proc. ASRU, 2025.
A. Kojima, Y. Fujita, H. Shi, T. Mizumoto, M. Zhao, and Y. Sudo, “Conversation Context-aware Direct Preference Optimization for Style-Controlled Speech Synthesis”, in Proc. APSIPA, 2025.
Y. Sudo, Y. Fujita, A. Kojima, T. Mizumoto, and L. Liu, "OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary", in Proc. INTERSPEECH, 2025. [arXiv]
Y. Fujita, T. Mizumoto, A. Kojima, L. Liu, and Y. Sudo, "AC/DC: LLM-based Audio Comprehension via Dialogue Continuation", in Proc. INTERSPEECH, 2025.
T. Mizumoto, A. Kojima, Y. Fujita, L. Liu, and Y. Sudo, "Is Synthetic Data Truly Effective for Training Speech Language Models?", in Proc. INTERSPEECH, 2025.
Y. Peng, S. Muhammad, Y. Sudo, W. Chen, J. Tian, J. Lin, and S. Watanabe, "OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning", in Proc. INTERSPEECH, 2025. (🏆⁠ISCA Best Student Paper Award 2025🏆️)
Y. Sudo, Y. Fukumoto, M. Shakeel, Y. Peng, C. J. Lin, and S. Watanabe, "DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition", in Proc. INTERSPEECH, 2025. [arXiv]
C. Maeda, S. Muhammad, and Y. Sudo, "Joint Target-Speaker ASR and Activity Detection", in Proc. INTERSPEECH, 2025.
Y. Sudo, Y. Fukumoto, M. Shakeel, Y. Peng, and S. Watanabe, “Contextualized Automatic Speech Recognition with Dynamic Vocabulary”, in Proc. SLT, 2024. (🏆IEEE SLT 2024 Best Paper Award🏆) [arXiv]
Y. Peng, Y. Sudo, M. Shakeel, and S. Watanabe, “OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification”, in Proc. ACL, 2024, pp. 10192-10209. [arXiv]
Y. Peng, J. Tian, W. Chen, S. Arora, B. Yan, Y. Sudo, S. Muhammad, K. Choi, J. Shi, X. Chan, J. Jung, and S. Watanabe, “OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer”, in Proc. INTEERSPEECH, 2024, pp. 352-356.
M. Shakeel, Y. Sudo, Y. Peng, and S. Watanabe, “Contextualized End-to-End Automatic Speech Recognition with Intermediate Biasing Loss”, in Proc. INTEERSPEECH, 2024, pp. 3909-3913.
T. Osaki, Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, "Improving Noise Robustness of Automatic Speech Recognition based on a Parallel Adapter Model with Near-Identity Initialization", in Proc. IEA/AIE, 2024.
Y. Sudo, M. Shakeel, Y. Fukumoto, Y. Peng, and S. Watanabe, “Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search”, in Proc. ICASSP, 2024, pp. 10896-10900. [arXiv]
M. Shakeel, Y. Sudo, Y. Peng, and S. Watanabe, “Joint Optimization of Streaming and Non-streaming Automatic Speech Recognition with Multi-decoder and Knowledge Distillation”, satellite workshop HSCMA in ICASSP, 2024.
Y. Peng, J. Tian, B. Yan, D. Berrebbi, X. Chang, X. Li, J. Shi, S. Arora, W. Chen, R. Sharma, W. Zhang, Y. Sudo, M. Shakeel, J. Jung, S. Maiti, S. Watanabe, "Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data”, in Proc. ASRU, 2023.
R. Takeda, Y. Sudo, and T. Komatani, “Flexible Evidence Model to Reduce Uncertainty Mismatch Between Speech Enhancement and ASR Based on Encoder-Decoder Architecture”, in Proc. APSIPA, 2023.
Y. Long, R. Yasuda, Y. Sudo, K. Itoyama, K. Nakadai, Hideharu Amano, and Kenji Nishida, “Sound event localization and detection utilizing overlapping end-to-end learning”, Proceedings of Asia Pacific Conference on Robot IoT System Development and Platform (APRIS), 2023.
Y. Sudo, M. Takigahira, H. Tsuru, K. Nakadai, and H. Nakajima, “Online Adaptation of Fourier Series Based Acoustic Transfer Function Model to Improve Sound Source Localization and Separation”, in Proc. RO-MAN, 2023.
Y. Sudo, M. Shakeel, B. Yan, J. Shi, and S. Watanabe, “4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders”, in Proc. INTERSPEECH, 2023, pp. 3312-3316. [arXiv]
Y. Sudo, M. Shakeel, Y. Peng, and S. Watanabe, “Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training”, in Proc. INTERSPEECH, 2023, pp. 4479-4483. [ISCA Archive]
Y. Sudo, K. Hata, and K. Nakadai, “Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation”, in Proc. INTERSPEECH, 2023, pp. 491-495. [arXiv]
Y. Peng, Y. Sudo, M. Shakeel, and S. Watanabe, “DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models”, in Proc. INTERSPEECH, 2023, pp. 62-66.
Y. Sudo, M. Shakeel, K. Nakadai, J. Shi and S. Watanabe, “Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection”, in Proc. INTERSPEECH, 2022, pp. 4641-4645. [ISCA Archive]
R. Takeda, Y. Sudo, K. Nakadai, and T. Komatani, “Empirical Sampling from Latent Utterance-wise Evidence Model for Missing Data ASR based on Neural Encoder-Decoder Model”, in Proc. INTERSPEECH, 2022, pp. 3789-3793.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, “Multi-channel Environmental sound segmentation utilizing Sound Source Localization and Separation U-Net”, in Proc. SII, 2021, pp. 382-387.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, “Multi-channel Environmental sound segmentation”, in Proc. SII, 2020, pp. 820-825.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, “Environmental sound segmentation utilizing Mask U-Net”, in Proc. IROS, 2019, pp. 5340-5345.
Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, “Improvement of DOA estimation by using quaternion output in sound event localization and detection”, in Proc. DCASE, 2019, pp. 244-247.
M. Iwatsuki, Y. Sudo, K. Itoyama, K. Nishida, and K. Nakadai, "Listen and Tell: Acoustic Scene Caption Generation using Deep Learning", in The Third International Workshop on Symbolic-Neural Learning, 2019.
Y. Sudo, Y. Kakinuma, K. Ohnishi, and T. Aoyama, "Development of Chatter Vibration Detecting System utilizing Sensor-less Process Monitoring", in Proc. of 43rd CIRP International Conference on Manufacturing Systems, Vienna, Austria, May 26-28, 2010, pp. 551-554.

国内学会

高城巽成，Jeon Haesung，藤田雄介，周藤唯, "BEST-RQ に基づく日本語音声基盤モデルの構築", 日本音響学会第155回(2026年春季)研究発表会, 2026.
渡邉一功, 水本智也, 周藤唯, 河原大輔, "応答内容・順序に着目した音声対話ベンチマークの構築", 言語処理学会第32回年次大会(NLP2026), 2026.
水本智也, 小島淳嗣, 藤田雄介, 周藤唯, "DiaFill: 短い発話でフィラーが豊富な音声対話台本の自動生成ツールキット", 言語処理学会第32回年次大会(NLP2026), 2026.
水本智也, 藤田雄介, Hao Shi, Lianbo Liu, 小島淳嗣, 周藤唯, "音声・テキスト大規模言語モデルにおける日本語方言の頑健性の調査", 言語理解とコミュニケーション研究会 (NLC), 2026.
周藤唯, Muhammad Shakeel, Peng Yifan, 渡部晋治, “動的な語彙拡張を用いたEnd-to-end音声認識の文脈適応”, 第66回人工知能学会 AIチャレンジ研究会予稿集, 2024, pp. 24-31. [Link]
大崎崇博, 周藤唯, 中臺一博, “音声強調と雑音特徴量を用いた音声認識の雑音耐性向上”, 第66回人工知能学会 AIチャレンジ研究会予稿集, 2024, pp. 1-7.
M. Ngai, C, Maeda, M. Shakeel, and Y. Sudo, “Speech Separation with Auxiliary Signal-to-Artifact Ratio Loss for Improving Multi-Talker ASR”, 第66回人工知能学会 AIチャレンジ研究会予稿集, 2024, pp. 8-15.
大崎崇博, 周藤唯, 糸山克寿, 中臺一博, “Biasing networkを用いた音声認識の雑音耐性向上”, 第42回日本ロボット学会学術講演会予稿集, 2024.
Y. Long, Y. Sudo, M, Shakeel, K. Itoyama, K. Nakadai, “A Multi-Form Language Speech Translation Model Based on ESPnet”, 第42回日本ロボット学会学術講演会予稿集, 2024.
大崎崇博, 周藤唯, 糸山克寿, 西田健次, 中臺一博, “Parallel Adapter ModelとNear-Identity初期化を用いた音声認識の雑音耐性向上”, 第63回人工知能学会 AIチャレンジ研究会予稿集, 2023, pp. 2-8 (🏆人工知能学会研究会優秀賞🏆).
周藤唯, 瀧ケ平将行 , 中臺一博, 中島弘史 , “フーリエ級数展開を用いた軽量伝達関数のオンライン適応による音源定位・分離の向上”, 第63回人工知能学会 AIチャレンジ研究会予稿集, 2023, pp. 39-46. [Link]
M. Shakeel, Y. Sudo, Y. Peng, S. Watanabe, “End-to-end integration of online and offline encoders using auxiliary losses for automatic speech recognition”, 第62回人工知能学会 AIチャレンジ研究会予稿集, 2023, pp. 9-14.
大崎崇博, 周藤唯, 糸山克寿, 西田健次, 中臺一博, “音声強調ネットワークとアダプター層を用いた音声認識モデルの耐ノイズ性向上”, 第41回日本ロボット学会学術講演会予稿集, 2023.
Y. Long, R. Yasuda, Y. Sudo, K. Itoyama, K. Nakadai, H. Amano, and K. Nishida, “An efficient end-to-end learning method for sound event localization and detection”, 第41回日本ロボット学会学術講演会予稿集, 2023.
周藤唯, Muhammad Shakeel, 中臺一博, 史嘉彤, 渡部晋治, “Blockwiseストリーミング音声認識と発話区間検出の統合”, 第61回人工知能学会 AIチャレンジ研究会予稿集, 2022, pp.51-56. [Link]
岩月道生, 周藤唯, 糸山克寿, 西田健次, 中臺一博, “音環境説明ロボットの実現に向けた環境音キャプションコーパスの構築”, 第37回日本ロボット学会学術講演会予稿集, 2019.
岩月道生, 周藤唯, 糸山克寿, 西田健次, 中臺一博, “Listen and Tell: 深層学習を用いた音響シーンのキャプション生成”, 第81回情報処理学会全国大会講演論文集, 2019, pp. 407-408.
周藤唯, 糸山克寿, 西田健次, 中臺一博, “Mask U-Net を用いた環境音セグメンテーションの提案”, 第52回人工知能学会 AIチャレンジ研究会予稿集, 2018, pp. 21-26.
周藤唯, 柿沼康弘, 大西公平, 青山藤詞郎, “エンドミル加工における外乱オブザーバを用いたセンサレスびびり振動検出技術の開発”, 精密工学会春季大会学術講演会, 2011, pp.333-334.
三又昭範, 周藤唯, 柿沼康弘, 青山藤詞郎, 柳川章全, "工作機械主軸用ラビリンスシールへの油滴の浸入メカニズムの数値解析", 精密工学会学術講演会講演論文集, 2009, pp. 1049-1050.

書籍（分担執筆）

“進化するヒトと機械の音声コミュニケーション Vol.2”, “2編1章3節　聴覚障がい者向け音声認識システムの開発”, エヌ・ティー・エス, 2025.

学位論文

Deep-Learning-Based Environmental Sound Segmentation - Integration of Sound Source Localization, Separation, and Classification -, Ph.D. Dissertation, Tokyo Institute of Technology, Japan, March 2021.
エンドミル加工における外乱オブザーバを用いたセンサレスびびり振動検出技術の開発とその抑制, 修士論文, 慶應義塾大学大学院理工学研究科, 2011年3月.
工作機械主軸用ラビリンスシールの油滴浸透メカニズムの数値解析, 学士論文, 慶應義塾大学, 2009年3月.

招待講演

"End-to-end音声認識の課題とDeep Biasingによるカスタマイズ", 電気音響研究会/応用音響研究会, 日本音響学会/電気情報通信学会, 2024. [Link]
"ホンダにおける音声認識の研究と聴覚障がい者向けシステムへの応用", 慶應義塾大学大学院理工学研究科, 先進システムデザイン工学 (2024/6/27回担当).
"深層学習を用いた音源定位、音源分離、クラス分類の統合　〜環境音セグメンテーション手法の紹介〜", Tokyo BISH Bash #05, 2021. [Link]

Page updated

Google Sites

Report abuse