Research Scientist at SB Intuitions Corp.
Research Scientist at SB Intuitions Corp.
My CV is available for Download (short bio)
Find me on LinkedIn and GitHub
Email: hshi@ieee.org / hao.shi@sbintuitions.co.jp
I am a Research Scientist at SB Intuitions, working on speech and language AI.
My research focuses on multi-talker ASR, speech enhancement, speech separation, speech-to-speech, and text-to-speech.
I am interested in building robust and practical spoken language systems for real-world environments.
Automatic Speech Recognition
Speech-LLM, noise-robust ASR, multi-talker ASR, adaptation, and multilingual ASR
Speech Separation
Target speaker extraction, blind source separation, overlap-aware processing
Text-to-Speech
LLM-based generative modeling, expressive speech synthesis, and spoken language generation
Speech Enhancement
Speech enhancement for robust ASR, front-end modeling, and generative approaches
Speech-to-Speech
End-to-end spoken language systems and interactive speech generation
Multi-Talker ASR
Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition
IEEE ASRU, 2025
Proposes serialized output prompting to improve large language model-based recognition in overlapping multi-talker speech.
Speech Enhancement
Combining Deterministic Enhanced Conditions with Dual-Streaming Encoding for Diffusion-Based Speech Enhancement
IEEE/ACM TASLP, 2025
Introduces a diffusion-based speech enhancement framework with deterministic enhanced conditions and dual-stream encoding.
Robust Speech Processing
Waveform-domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition
IEEE/ACM TASLP, 2024
Presents a waveform-domain speech enhancement method with spectrogram encoding for robust speech recognition.