[1] K. Li, K. Zaman, X. Li, M. Akagi, J. Dang and M. Unoki, "Machine Anomalous Sound Detection Using Spectral-Temporal Modulation Representations Derived From Machine-Specific Filterbanks," in IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 2059-2073, 2025, doi: 10.1109/TASLPRO.2025.3570956.
[2] Y. Liu, X. Chen, Z. Peng, Y. Li, X. Li, P. Song, M. Unoki, and Z. Zhao, "Enhancing Speech Emotion Recognition With Conditional Emotion Feature Diffusion and Progressive Interleaved Learning Strategy," in IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 1787-1800, 2025, doi: 10.1109/TASLPRO.2025.3561606.
[3] J. He, X. Shi, C. H. Hu, J. Mi, X. Li and T. Toda, "M4SER: Multimodal, Multirepresentation, Multitask, and Multistrategy Learning for Speech Emotion Recognition," in IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 4055-4070, 2025, doi: 10.1109/TASLPRO.2025.3614428.
[4] Yang Liu, Xin Chen, Yarong Li, Jie Ma, Xiaoqi Yang, Yuan Song, Xiaolei Meng, Yongwei Li, Xingfeng Li, Zhen Zhao, "Enhanced Speech Emotion Recognition in Noisy Environments: Adaptive Emotion Denoising Diffusion Approach With Iterative Confidence Learning Strategy," in IEEE Internet of Things Journal, vol. 12, no. 20, pp. 43241-43254, 15 Oct.15, 2025, doi: 10.1109/JIOT.2025.3595096.