Conference Papers (Reviewed)
- Atsushi Kojima, Yusuke Fujita, Hao Shi, Tomoya Mizumoto, Mengjie Zhao, Yui Sudo, ``Conversation Context-aware Direct Preference Optimization for Style-Controlled Speech Synthesis,'' in Proc. APSIPA, 2025 (Accepted).
- Hao Shi, Yusuke Fujita, Tomoya Mizumoto, Lianbo Liu, Atsushi Kojima, and Yui Sudo, “Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition'', in Proc. IEEE-ASRU, 2025 (Accepted). [PDF] [BibTeX]
- Tomoya Mizumoto, Yusuke Fujita, Hao Shi, Lianbo Liu, Atsushi Kojima, and Yui Sudo, “Evaluating Japanese Dialect Robustness across Speech and Text-based Large Language Models'', in Proc. IEEE-ASRU, 2025 (Accepted). [BibTeX]
- Jiahui Zhao, * Hao Shi, Tianrui Wang, Hexin Liu, Zhaoheng Ni, Lingxuan Ye, and Longbiao Wang, “Adapting Pretrained Speech Recognition Models for Code-Switching through Encoding Refining and Language-Aware Attention-based Decoding,” in Proc. IEEE-ICASSP, 2025. [PDF] [BibTeX]
- Zhongjian Cui, Chenrui Cui, Tianrui Wang, Mengnan He, Hao Shi, Meng Ge, Caixia Gong, Longbiao Wang, and Jianwu Dang, “Reducing the Gap between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module,” in Proc. IEEE-ICASSP, 2025. [PDF] [BibTeX]
- Hao Shi, Yuan Gao, Zhaoheng Ni, and Tatsuya Kawahara, “Serialized Speech Information Guidence with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition,” in Proc. IEEE-SLT, 2024, pp.198–204. [PDF] [BibTeX]
- Hao Shi, and Tatsuya Kawahara, “Dual-path Adaptation of Pretrained Feature Extraction Module for Robust Automatic Speech Recognition,” in Proc. INTERSPEECH, 2024, pp.2850–2854. [PDF] [BibTeX]
- Yuan Gao, Hao Shi, Chenhui Chu, and Tatsuya Kawahara, “Speech Emotion Recognition with Multi-level Acoustic and Semantic Information Extraction and Interaction,” in Proc. INTERSPEECH, 2024, pp.1060–1064. [PDF] [BibTeX]
- Yuchun Shu, Bo Hu, Yifeng He, Hao Shi, Longbiao Wang, and Jianwu Dang, “Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition,” in Proc. INTERSPEECH, 2024, pp.3500–3504. [PDF] [BibTeX]
- Hao Shi, Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani, and Shoko Araki, “Ensemble Inference for Diffusion Model-based Speech Enhancement,” in Proc. IEEE-ICASSPW, 2024, pp.735–739. [PDF] [BibTeX]
- Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, and Yuki Mitsufuji, “Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders,” in Proc. IEEE-ICASSP, 2024, pp.12951–12955. [PDF] [BibTeX]
- Yuan Gao, Hao Shi, Chenhui Chu, and Tatsuya Kawahara, “Enhancing Two-stage Finetuning for Speech Emotion Recognition Using Adapters,” in Proc. IEEE-ICASSP, 2024, pp.11316–11320. [PDF] [BibTeX]
- Zhi Zhong, Hao Shi, Masato Hirano, Kazuki Shimada, Kazuya Tateishi, Takashi Shibuya, Shusuke Takahashi, and Yuki Mitsufuji, “Extending Audio Masked Autoencoders Toward Audio Restoration,” in Proc. WASPAA, 2023, pp.1–5. [PDF] [BibTeX]
- Hao Shi, Masato Mimura, Longbiao Wang, Jianwu Dang, and Tatsuya Kawahara, “Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder And Decoder,” in Proc. IEEE-ICASSP, 2023, pp.1–5. [PDF] [BibTeX]
- Yanbing Yang, Hao Shi, Yuqin Lin, Meng Ge, Longbiao Wang, Qingzhi Hou and Jianwu Dang, “Adaptive Attention Network with Domain Adversarial Training for Multi-Accent Speech Recognition,” in Proc. ISCSLP, 2022, pp.6–10. [PDF] [BibTeX]
- Hao Shi, Yuchun Shu, Longbiao Wang, Jianwu Dang, and Tatsuya Kawahara, “Fusing Multiple Bandwidth Spectrograms for Improving Speech Enhancement,” in Proc. APSIPA ASC, 2022, pp.1935–1940. [PDF] [BibTeX]
- Hao Shi, Longbiao Wang, Sheng Li, Jianwu Dang, and Tatsuya Kawahara, “Subband-Based Spectrogram Fusion for Speech Enhancement by Combining Mapping and Masking Approaches,” in Proc. APSIPA ASC, 2022, pp.286–292. [PDF] [BibTeX]
- Hao Shi, Longbiao Wang, Sheng Li, Jianwu Dang, and Tatsuya Kawahara, “Monaural speech enhancement based on spectrogram decomposition for convolutional neural network-sensitive feature extraction,” in Proc. INTERSPEECH, 2022, pp.221–225. [PDF] [BibTeX]
- Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, * Hao Shi, Yongjie Lv, Yuqin Lin, and Jianwu Dang, “Languagespecific Characteristic Assistance for Code-switching Speech Recognition,” in Proc. INTERSPEECH, 2022, pp.3924–3928. [PDF] [BibTeX]
- Qiang Xu, Tongtong Song, Longbiao Wang, * Hao Shi, Yuqin Lin, Yongjie Lv, Meng Ge, Qiang Yu, and Jianwu Dang, “Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model,” in Proc. INTERSPEECH, 2022, pp.1716–1720. [PDF] [BibTeX]
- Hao Shi, Longbiao Wang, Sheng Li, Cunhang Fan, Jianwu Dang, and Tatsuya Kawahara, “Spectrograms Fusion-based End-to-end Robust Automatic Speech Recognition,” in Proc. APSIPA ASC, 2021, pp.438–442. [PDF] [BibTeX]
- Luya Qiang, Hao Shi, Meng Ge, Haoran Yin, Nan Li, Longbiao Wang, Sheng Li, and Jianwu Dang, “Speech Dereverberation Based on Scale-aware Mean Square Error Loss,” in Proc. ICONIP, 2021, pp.55–63. [PDF] [BibTeX]
- Haoran Yin, Hao Shi, Longbiao Wang, Luya Qiang, Sheng Li, Meng Ge, Gaoyan Zhang, and Jianwu Dang, “Simultaneous Progressive Filtering-based Monaural Speech Enhancement,” in Proc. ICONIP, 2021, pp.213–221. [PDF] [BibTeX]
- Hao Shi, Longbiao Wang, Meng Ge, Sheng Li, and Jianwu Dang, “Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation,” in Proc. IEEE-ICASSP, 2020, pp.7544–7548. [PDF] [BibTeX]
- Hao Shi, Longbiao Wang, Sheng Li, Chenchen Ding, Meng Ge, Nan Li, Jianwu Dang, and Hiroshi Seki, “Singing Voice Extraction with Attention based Spectrograms Fusion,” in Proc. INTERSPEECH, 2020, pp.2412–2416. [PDF] [BibTeX]
- Meng Ge, Longbiao Wang, Nan Li, Hao Shi, Jianwu Dang, and Xiangang Li, “Environment-dependent attention-driven recurrent convolutional neural network for robust speech enhancement,” in Proc. INTERSPEECH, 2019, pp.3153–3157. [PDF] [BibTeX]