Listening Samples

This page presents audio clips for comparative listening of four systems.

Two are the newly proposed neural audio coders (NACs) trained using discriminators, MS-STFTD and MS-PAMD, and the third one is our previous NAC in [2].

The last one is the AAC-LC with Fraunhofer FDK AAC encoder. For a fair comparison with NACs, the AAC cutoff was set to 20kHz.

References

[2] Joon Byun, Seungmin Shin, Youngcheol Park, Jongmo Sung, and Seungkwon Beack , “A perceptual neural audio coder with a mean-scale hyperprior,” ICASSP 2023.

[4] Joon Byun, Seungmin Shin, Youngcheol Park, Jongmo Sung, and Seungkwon Beack , “Development of a psychoacoustic loss function for the deep neural network (DNN)-based speech coder,” Interspeech 2021.

[5] Seungmin Shin, Joon Byun, Youngcheol Park, Jongmo Sung, and Seungkwon Beack , “Deep Neural Network (DNN) Audio Coder Using A Perceptually Improved Training Method,” IEEE ICASSP 2022.

[16] Joon Byun, Seungmin Shin, Youngcheol Park, Jongmo Sung, and Seungkwon Beack , “Optimization of Deep Neural Network (DNN) Speech Coder Using a Multi Time Scale Perceptual Loss Function,” Interspeech 2022.

[17] Joon Byun, Seungmin Shin, Youngcheol Park, Jongmo Sung, and Seungkwon Beack , “Perceptual improvement of deep neural network (DNN) speech coder using parametric and non-parametric density models,” Interspeech 2023.

Encoded at 56kbps, fs=44.1kHz

<Sample-1> <Sample-2>

Original

A. FDK AAC

B. Prev in [2]

C. Base w MS-STFTD

D. Base w MS-PAMD

Encoded at 64kbps, fs=44.1kHz

<Sample-3> <Sample-4>

Original

A. FDK AAC

B. Prev in [2]

C. Base w MS-STFTD

D. Base w MS-PAMD