Yoshiki Masuyama

Publications

Journal

Y. Masuyama, G. Wichern, F. G. Germain, C. Ick, J. Le Roux, "RANF: Neural Field-Based HRTF Spatial Upsampling with Retrieval Augmentation and Parameter Efficient Fine-Tuning," IEEE IEEE Open J. Signal Process., 2025. [Code]
Y. Masuyama, G. Wichern, F. G. Germain, C. Ick, J. Le Roux, "SuDaField: Subject- and Dataset-Aware Neural Field for HRTF Modeling," IEEE IEEE Open J. Signal Process., 2025. [Code]
S. Cornell, C. Boeddeker, T. Park, H. Huang, D. Raj, M. Wiesner, Y. Masuyama, X. Chang, Z.-Q. Wang, S. Squartini, P. Garcia, S. Watanabe, "Recent trends in distant conversational speech recognition: A review of CHiME-7 and 8 DASR challenges," Comput. Speech Lang., 2025.
Y. Masuyama, X. Chang, W. Zhang, S. Cornell, Z.-Q. Wang, N. Ono, Y. Qian, S. Watanabe, "An End-to-End Integration of Speech Separation and Recognition with Self-Supervised Learning Representation," Comput. Speech Lang., 2025.
Y. Masuyama, K. Yamaoka, T. Kawamura, N. Ono, "Efficient joint optimization of sampling rate offsets using entire multichannel signal," IEEE/ACM Trans. Audio, Speech, Lang., Process., vol. 32, pp. 1816-1828, 2024.
Y. Masuyama, K. Yamaoka, Y. Kinoshita, T. Nakashima, N. Ono, "Causal and relaxed-distortionless response beamforming for online target source extraction," IEEE/ACM Trans. Audio, Speech, Lang., Process., vol. 32, pp. 310-324, 2024. [Project page]
Y.-J. Lu, X. Chang, C. Li, W. Zhang, S. Cornell, Z. Ni, Y. Masuyama, B. Yan, R. Scheibler, Z.-Q. Wang, Y. Tsao, Y. Qian, S. Watanabe, "Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing," J. Open Source Softw., 2023.
Y. Masuyama, K. Yatabe, K. Nagatomo, Y. Oikawa, "Online phase reconstruction via DNN-based phase differences estimation," IEEE/ACM Trans. Audio, Speech, Lang., Process., vol. 31, pp. 163-176, 2023.

K. Kobayashi, Y. Masuyama, K. Yatabe, Y. Oikawa, "Phase-recovery algorithm for harmonic/percussive source separation based on observed phase information and analytic computation," Acoust. Sci. & Tech., vol.42, np.5, pp.261--269, 2021.
Y. Bando, Y. Masuyama, Y. Sasaki, M. Onishi, "Robust auditory functions based on probabilistic integration of MUSIC and CGMM," IEEE Access, vol.9, pp.38718--38730, 2021.
Y. Masuyama, K. Yatabe, Y. Koizumi, Y. Oikawa, N. Harada, "Deep Griffin-Lim iteration: Trainable iterative phase reconstruction using neural network," IEEE J. Sel. Top. Signal Process., vol.15, no.1, pp.37--50, 2021. (IEEE SPS Tokyo Joint Chapter Student Journal Paper Award) [Project page]
Y. Masuyama, T. Kusano, K. Yatabe, Y. Oikawa, "Modal decomposition of musical instrument sounds via optimization-based non-linear filtering," Acoust. Sci. & Tech., vol.40, no.3, pp.186--197, 2019.

Letters

Y. Bando, K. Sekiguchi, Y. Masuyama, A. A. Nuguraha, M. Fontaine and K. Yoshii, "Neural full-rank spatial covariance analysis for blind source separation," IEEE Signal Process. Lett., vol.28, pp.1670--1674, Aug. 2021. [Project page]
Y. Masuyama, K. Yatabe, K. Nagatomo, Y. Oikawa, "Joint amplitude and phase refinement for monaural source separation," IEEE Signal Process. Lett., vol.27, pp.1939--1943, Oct. 2020. [MATLAB CODE]
Y. Masuyama, K. Yatabe, Y. Oikawa, "Griffin-Lim like phase recovery via alternating direction method of multipliers," IEEE Signal Process. Lett., vol.26, no.1, pp.184--188, Jan. 2019. [Project page] [MATLAB CODE]

Tutorial Paper

K. Yatabe, Y. Masuyama, T. Kusano, Y. Oikawa, "Representation of complex spectrogram via phase conversion," Acoust. Sci. & Tech., vol.40, no.3, pp.170--177, May 2019. [MATLAB CODE]
矢田部浩平, 升山義紀, 草野翼, 及川靖広, "位相変換による複素スペクトログラムの表現," 日本音響学会誌, vol.75, no.3, pp.147--155, Mar. 2019.

International Conference and Workshop

Five papers are accepted to Interspeech 2026
Y. Masuyama, F. G. Germain, G. Wichern, C. Hori, J. Le Roux, "Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling," Proc. ICASSP 2026.
Y. Masuyama, K. Saijo, F. Paissan, J. Han, M. Delcroix, R. Aihara, F. G. Germain, G. Wichern, J. Le Roux, "FlexIO: Flexible Single-and Multi-Channel Speech Separation and Enhancement," Proc. ICASSP 2026.
R. Aihara, Y. Masuyama, F. Paissan, F. G. Germain, G. Wichern, J. Le Roux "SUNAC: Source-aware Unified Neural Audio Codec," Proc. ICASSP 2026.
J. Han, R. Wang, Y. Masuyama, M. Delcroix, J. Rohdin, J. Du, L. Burget, "Spatially Aware Self-Supervised Models for Multi-Channel Neural Speaker Diarization," Proc. ICASSP 2026.
R. Aihara, Y. Masuyama, G. Wichern, F. G. Germain, J. Le Roux, "Exploring disentangled neural speech codecs from self-supervised representations," Proc. ICASSPW LRAC 2026 workshop.
C. Hori, Y. Masuyama, S. Jain, R. Corcodel, D. Jha, D. Romeres, J. Le Roux, "Robot Confirmation Generation and Action Planning Using Long-context Q-Former Integrated with Multimodal LLM," Proc. ASRU 2025 (Best Paper Award Candidate).
Y. Masuyama, F. G. Germain, G. Wichern, C. Ick, J. Le Roux, "Physics-Informed Direction-Aware Neural Acoustic Fields," Proc. WASPAA 2025 (Best Paper Award Candidate).
F. Paissan, G. Wichern, Y. Masuyama, R. Aihara, F. G. Germain, K. Saijo, and J. Le Roux, "FasTUSS: Faster Task-Aware Unified Source Separation," Proc. WASPAA 2025.
T. Kawamura, Y. Masuyama, N. Ono, "Domain Adaptation for Multi-Channel Acoustic Scene Classification to Different Array Positions," Proc. EUSIPCO 2025.
C. Ick, G. Wichern, Y. Masuyama, F. Germain and J. Le Roux, "Direction-Aware Neural Acoustic Fields for Few-Shot Interpolation of Ambisonic Impulse Responses," Proc. Interspeech 2025.
H. Yang, G. Wichern, R. Aihara, Y. Masuyama, S. Khurana, F. Germain, J. Le Roux, "Investigating Continuous Autoregressive Generative Speech Enhancement," Proc. Interspeech 2025.
S. Khurana, D. Klement, A. Laurent, D. Bobos, J. Novosad, P. Gazdik, E. Zhang, Z. Huang, A. Hussein, R. Marxer, Y. Masuyama, R. Aihara, C. Hori, F. Germain, G. Wichern, J. Le Roux, "Factorized RVQ-GAN For Disentangled Speech Tokenization," Proc. Interspeech 2025.
J. Tian, J. Shi, W. Chen, S. Arora, Y. Masuyama, T. Maekaku, Y. Wu, J. Peng, S. Bharadwaj, Y. Zhao, S. Cornell, Y. Peng, X. Yue, C. H. Huck Yang, G. Neubig, and S. Watanabe, "ESPnet-SpeechLM: An Open Speech Language Model Toolkit," Proc. NAACL Demo Track, Apr. 2025.
C. Ick, G. Wichern, Y. Masuyama, F. Germain and J. Le Roux, "Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training," Proc. IEEE ICASSP Satellite Workshop on GenDA 2025.
Y. Masuyama, G. Wichern, F. G. Germain, C. Ick and J. Le Roux, "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization," Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2025. [Code]
Y. Masuyama, N. Ueno and N. Ono, "Mel-Spectrogram Inversion via Alternating Direction Method of Multipliers," Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2025. [Demo]
C. Ick, G. Wichern, Y. Masuyama, F. Germain and J. Le Roux, "Spatially-Aware Losses for Enhanced Neural Acoustic Fields," Proc. NeurIPS Workshop Audio Imagin., Dec. 2024.
Y. Masuyama, K. Miyazaki and M. Murata, "Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition," Proc. IEEE Spok. Lang. Tech. Workshop (SLT), Dec. 2024. [Code]
J. Shi, J. Tian, Y. Wu, J. Jung, J. Q. Yip, Y. Masuyama, W. Chen, Y. Wu, Y. Tang, M. Baali, D. Alharthi, D. Zhang, R. Deng, T. Srivastava, H. Wu, A. Liu, B. Raj, Q. Jin, R. Song, and S. Watanabe, "ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music and Speech," Proc. IEEE Spok. Lang. Tech. Workshop (SLT), Dec. 2024.
K. Miyazaki, Y. Masuyama, M. Murata, "Exploring the Capability of Mamba in Speech Applications," Proc. ISCA Interspeech, Sep. 2024.
Y. Masuyama, G. Wichern, F. G. Germain, Z. Pan, S. Khurana, C. Hori, J. Le Roux, "NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization," Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2024. [Code]
Z. Pan, G. Wichern, Y. Masuyama, F. Germain, S. Khurana, C. Hori and J. Le Roux, "Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction," Proc. IEEE Autom. Speech Recognit. and Underst. Workshop (ASRU), Dec. 2023.
K. Yamada, Y. Masuyama, K. Yamaoka and N. Ono, "Fundamental Frequency Estimation Based on Finite-Order Harmonic Constraint Differential Equation," Proc. Asia-Pacific Signal Inf. Process. Assoc. Annual Summit Conf. (APSIPA ASC), Oct. 2023.
Y. Masuyama*, X. Chang*, W. Zhang, S. Cornell, Z.-Q. Wang, N. Ono, Y. Qian and S. Watanabe, "Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation," IEEE Workshop Appl. of Signal Process. Audio, Acoust. (WASPAA), Oct. 2023. [Demo]
Y. Masuyama, N. Ueno and N. Ono, "Signal Reconstruction from Mel-spectrogram Based on Bi-level Consistency of Full-band Magnitude and Phase," IEEE Workshop Appl. of Signal Process. Audio, Acoust. (WASPAA), Oct. 2023. [Demo]
Y. Bando, Y. Masuyama, A. A. Nuguraha and K. Yoshii, "Neural Fast Full-Rank Spatial Covariance Analysis for Blind Source Separation," Proc. Eur. Signal Process. Conf. (EUSIPCO), Sep. 2023.
S. Cornell, Z.-Q. Wang, Y. Masuyama, S. Watanabe, M. Pariente, N. Ono and S. Squartini, "Multi-Channel Speaker Extraction with Adversarial Training: The Wavlab Submission to The Clarity ICASSP 2023 Grand Challenge," IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), June 2023.
Y. Masuyama, X. Chang, S. Cornell, S. Watanabe and N. Ono, "End-to-End Integration of Speech Recognition, Dereverberation, Beamforming and Self-Supervised Learning Representation," Proc. IEEE Spok. Lang. Tech. Workshop (SLT), Jan. 2023. (Best Student Paper Award) [Demo]
S. Cornell, Z.-Q. Wang, Y. Masuyama, S. Watanabe, M. Pariente and N. Ono, "Multi-Channel Target speaker Extraction with Refinement: The WavLab Submission to The Second Clarity Enhancement Challenge," Proc. Clarity Challenge, Dec. 2022.
K. Yamada, Y. Masuyama, Y. Wakabayashi and N. Ono, "Simultaneous frequency estimation for three or more sinusoids based on sinusoidal constraint differential equation," Proc. Asia-Pacific Signal Inf. Process. Assoc. Annual Summit Conf. (APSIPA ASC), Nov. 2022.
Y. Masuyama, K. Yamaoka and N. Ono, "Joint optimization of sampling rate offsets based on entire signal relationship among distributed microphones," Proc. ISCA Interspeech, Aug. 2022.
Y.-J. Lu, X. Chang, C. Li, W. Zhang, S. Cornell, Z. Ni, Y. Masuyama, B. Yan, R. Scheibler, Z.-Q. Wang, Y. Tsao, Y. Qian and S. Watanabe, "ESPnet-SE++: Speech enhancement for robust speech recognition, translation and understanding," Proc. ISCA Interspeech, Aug. 2022.
Y. Masuyama, K. Yamaoka, Y. Kinoshita and N. Ono, "Causal distortionless response beamforming by alternating direction method of multipliers," Proc. Asia-Pacific Signal Inf. Process. Assoc. Annual Summit Conf. (APSIPA ASC), Dec. 2021.
Y. Masuyama, T. Tanaka, K. Yatabe, T. Kusano and Y. Oikawa, "Simultaneous declipping and beamforming via alternating direction method of multipliers," Proc. Eur. Signal Process. Conf. (EUSIPCO), Aug. 2021.
M. Togami, Y. Masuyama, T. Komatsu, K. Yoshii and T. Kawahara, "Computer-resource-aware deep speech separation with a run-time-specified number of BLSTM layers," Proc. Asia-Pacific Signal Inf. Process. Assoc. Annual Summit Conf. (APSIPA ASC), Dec. 2020.
Y. Masuyama, Y. Bando, K. Yatabe, Y. Sasaki, M. Onishi and Y. Oikawa, "Self-supervised neural audio-visual sound source localization via probabilistic spatial modeling ," Proc. IEEE/RSJ Int. Conf. Intell. Robot Syst. (IROS), Oct. 2020. (IEEE RAS Japan Chapter Young Award)
Y. Masuyama, M. Togami and T. Komatsu, "Consistency-aware multi-channel speech enhancement using deep neural networks," Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2020.
Y. Masuyama, K. Yatabe, Y. Koizumi, Y. Oikawa and N. Harada, "Phase reconstruction based on recurrent phase unwrapping with deep neural networks," Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2020. [Project page]
Y. Koizumi, K. Yatabe, M. Delcroix, Y. Masuyama and D. Takeuchi, "Speech enhancement using self-adaptation and multi-head self-attention," Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2020.
M. Togami, Y. Masuyama, T. Komatsu and Y. Nakagome "Unsupervised training for deep speech source separation with Kullback-Leibler divergence based probabilistic loss function," Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2020.
Y. Masuyama, M. Togami and T. Komatsu, "Multichannel loss function for supervised speech source separation by mask-based beamforming," Proc. ISCA Interspeech, Sep. 2019.
T. Kusano, Y. Masuyama, K. Yatabe and Y. Oikawa, "Designing nearly tight window for improving time-frequency masking," Proc. Int. Congr. Acoust. (ICA), Sep. 2019.
Y. Masuyama, K. Yatabe, Y. Koizumi, Y. Oikawa and N. Harada, "Deep Griffin-Lim iteration," Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2019. (IEEE SPS Tokyo Joint Chapter Student Conference Paper Award)
Y. Masuyama, K. Yatabe and Y. Oikawa, "Low-rankness of complex-valued spectrogram and its application to phase-aware audio processing," Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2019.
Y. Masuyama, K. Yatabe and Y. Oikawa, "Phase-aware harmonic/percussive source separation via convex optimization," Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2019.
Y. Masuyama, K. Yatabe and Y. Oikawa, "Model-based phase recovery of spectrograms via optimization on Riemannian manifolds,'' Proc. Int. Workshop Acoust. Signal Enhanc. (IWAENC), Sep. 2018.
K. Yatabe, Y. Masuyama and Y. Oikawa, "Rectified linear unit can assist Griffin-Lim phase recovery,'' Proc. Int. Workshop Acoust. Signal Enhanc. (IWAENC), Sep. 2018.
Y. Masuyama, T. Kusano, K. Yatabe and Y. Oikawa, "Modal decomposition of musical instrument sound via alternating direction method of multipliers,'' Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2018.

Domestic Conferences (in Japanese)

相原龍, 升山義紀, ウィシャーン・ゴードン, ジェルマン・フランソワ, ルルー・ジョナトン, "話者・言語情報の構造的分離を可能にするニューラル音声コーデックの検討," 日本音響学会講演論文集, Mar. 2026.
河村隆生, 升山義紀, 小野順貴, "音響シーン分類における室内の異なる位置のマイクロホンアレイに対するドメイン適応," 信号処理シンポジウム予稿集, Dec. 2024.
升山義紀, 高道慎之介, 谷中瞳, 柿沼太一, "次世代の音声言語情報処理技術のためのAI研究開発," パネル討論, 電子情報通信学会音声研究会, Oct. 2024
升山義紀, "ICASSP2024における音源分離・音声強調の動向," 電子情報通信学会信号処理研究会, May 2024.
升山義紀, 山岡洸瑛, 木下裕磨, 小野順貴, "因果的MPDRビームフォーマのオンライン化およびタップ長の影響評価," 日本音響学会講演論文集, Sep. 2022.
升山義紀, 山岡洸瑛, 小野順貴, "尤度計算に用いる周波数帯域の逐次増大による初期値に頑健なブラインド同期," 電子情報通信学会応用音響研究会, Aug. 2022.
升山義紀, 山岡洸瑛, 小野順貴, "補助関数法による複数の非同期録音信号のブラインド同期," 日本音響学会講演論文集, Mar. 2022.
山田健太, 升山義紀, 若林佑幸, 小野順貴, "微分方程式に基づく複数の正弦波の周波数同時推定," 日本音響学会講演論文集, Mar. 2022.
坂東宜昭, 升山義紀, 佐々木洋子, 大西正輝, "雑踏環境における音源地図の生成," 第58回人工知能学会 AI チャレンジ研究会, Nov. 2021. （人工知能学会研究会優秀賞）
升山義紀, 山岡洸瑛, 木下裕磨, 小野順貴, "因果的MPDRビームフォーマの近接分離最適化による設計," 日本音響学会講演論文集, Sep. 2021.
升山義紀, 坂東宜昭, 佐々木洋子, 大西正輝, 矢田部浩平, 及川靖広, "視聴覚統合に基づく音源定位と音区間検出の自己教師あり学習," 情報処理学会第83回全国大会, Mar. 2021. (学生奨励賞)
升山義紀, 矢田部浩平, 長友健人, 及川靖広, "モノラル音源分離のための一般化KLダイバージェンスに基づいた位相復元," 日本音響学会講演論文集, Mar. 2021.
坂東宜昭, 工藤一輝, 升山義紀, 佐々木洋子, 大西正輝 "MUSIC法と混合複素ガウスモデルに基づくロボット聴覚," 日本ロボット学会第38回学術講演会, Oct. 2020.
升山義紀, 坂東宜昭, 佐々木洋子, 大西正輝, 矢田部浩平, 及川靖広, "音源数と音源位置を同時推定する視聴覚統合 DNN の自己教師あり学習," 日本音響学会講演論文集, Sep. 2020.
升山義紀, 矢田部浩平, 小泉悠馬, 原田登, 及川靖広, "複素DNN を用いた深層Griffin-Lim位相復元," 日本音響学会講演論文集, Sep. 2020.
長友健人, 升山義紀, 矢田部浩平, 及川靖広, 竹内大起, 小泉悠馬, "複数解像度のスペクトログラムを用いた DNN音声強調,'' 日本音響学会講演論文集, Sep. 2020.
升山義紀, 坂東宜昭, 大西正輝, 矢田部浩平, 及川靖広, "全方位画像と多チャネル音響信号を用いた自己教師あり深層音源定位," 日本音響学会講演論文集, Mar. 2020.
長友健人, 升山義紀, 竹内大起, 矢田部浩平, 及川靖広, "複数解像度のスペクトログラムを用いたDNN歌声分離,'' 日本音響学会講演論文集, Mar. 2020.
升山義紀, 矢田部浩平, 小泉悠馬, 原田登, 及川靖広, "位相の微分値に基づいた DNN 位相復元," 日本音響学会講演論文集 , Sep. 2019. [原稿訂正版]
升山義紀, 矢田部浩平, 及川靖広 , "複素スペクトログラムのスパース・低ランクモデリング ," 日本音響学会講演論文集 , Sep. 2019.
矢田部浩平, 升山義紀, 草野翼, 及川靖広, "MATLAB 瞬時周波数Toolbox ," 日本音響学会講演論文集 , Sep. 2019.
長友健人, 升山義紀, 竹内大起, 矢田部浩平, 及川靖広, "位相を考慮した調波音・打楽器音分離 ," 日本音響学会講演論文集 , Sep. 2019.
升山義紀, 矢田部浩平, 小泉悠馬, 原田登, 及川靖広, "DeGLI: 深層Griffin-Lim位相復元," 日本音響学会講演論文集, Mar. 2019.
升山義紀, 矢田部浩平, 及川靖広, "瞬時周波数に基づいた複素スペクトログラムの低ランクモデリング," 日本音響学会講演論文集, Mar. 2019.
草野翼, 升山義紀, 矢田部浩平, 及川靖広, "時間周波数マスキング性能を向上させる窓関数," 日本音響学会講演論文集, Mar. 2019.
升山義紀, 矢田部浩平, 及川靖広, "ADMMを用いたGriffin-Lim型位相復元,'' 日本音響学会講演論文集, Sep. 2018.
升山義紀, 矢田部浩平, 及川靖広, "正弦波モデルと多様体上の最適化による位相復元,'' 日本音響学会講演論文集, Sep. 2018.
草野翼, 升山義紀, 矢田部浩平, 及川靖広, "音響信号処理に対する逆短時間Fourier変換の合成窓関数の影響, '' 日本音響学会講演論文集, Sep. 2018.
矢田部浩平, 升山義紀, 及川靖広, "ReLUはGriffin-Limアルゴリズムの一助となるか,'' 日本音響学会講演論文集, Sep. 2018.
升山義紀, 草野翼, 矢田部浩平, 及川靖広, 大石耕史, 宮城雄介, 高橋健, ``交互方向乗数法を用いたモード分解による楽器音の解析, " 日本音響学会講演論文集, Mar. 2018.
升山義紀, 草野翼, 矢田部浩平, 及川靖広, 宮城雄介, 大石耕史, ``データ忠実性を制約とした最適化による楽器音のモード分解, " 日本音響学会音楽音響研究会資料, MA2017-36, Oct. 2017
升山義紀, 草野翼, 矢田部浩平, 及川靖広, 宮城雄介, 大石耕史, ``制約付き最適化を用いた楽器音のモード分解 , " 日本音響学会講演論文集, Sep. 2017.（学生優秀発表賞受賞）

Google Sites

Report abuse