This page presents audio clips for comparative listening of four systems.
Two are the newly proposed neural audio coders (NACs) trained using discriminators, MS-STFTD and MS-PAMD, and the third one is our previous NAC in [2].
The last one is the AAC-LC with Fraunhofer FDK AAC encoder. For a fair comparison with NACs, the AAC cutoff was set to 20kHz.
References
[2] Joon Byun, Seungmin Shin, Youngcheol Park, Jongmo Sung, and Seungkwon Beack , “A perceptual neural audio coder with a mean-scale hyperprior,” ICASSP 2023.
[4] Joon Byun, Seungmin Shin, Youngcheol Park, Jongmo Sung, and Seungkwon Beack , “Development of a psychoacoustic loss function for the deep neural network (DNN)-based speech coder,” Interspeech 2021.
[5] Seungmin Shin, Joon Byun, Youngcheol Park, Jongmo Sung, and Seungkwon Beack , “Deep Neural Network (DNN) Audio Coder Using A Perceptually Improved Training Method,” IEEE ICASSP 2022.
[16] Joon Byun, Seungmin Shin, Youngcheol Park, Jongmo Sung, and Seungkwon Beack , “Optimization of Deep Neural Network (DNN) Speech Coder Using a Multi Time Scale Perceptual Loss Function,” Interspeech 2022.
[17] Joon Byun, Seungmin Shin, Youngcheol Park, Jongmo Sung, and Seungkwon Beack , “Perceptual improvement of deep neural network (DNN) speech coder using parametric and non-parametric density models,” Interspeech 2023.
Encoded at 56kbps, fs=44.1kHz
<Sample-1> <Sample-2>
Original
Original
A. FDK AAC
A. FDK AAC
B. Prev in [2]
B. Prev in [2]
C. Base w MS-STFTD
C. Base w MS-STFTD
D. Base w MS-PAMD
D. Base w MS-PAMD
Encoded at 64kbps, fs=44.1kHz
<Sample-3> <Sample-4>
Original
Original
A. FDK AAC
A. FDK AAC
B. Prev in [2]
B. Prev in [2]
C. Base w MS-STFTD
C. Base w MS-STFTD
D. Base w MS-PAMD
D. Base w MS-PAMD