Phase reconstruction, which estimates phase from a given amplitude spectrogram, is an active research field in acoustical signal processing with many applications including audio synthesis. To take advantage of rich knowledge from data, several studies presented deep neural networks (DNNs)–based phase reconstruction methods. However, the training of a DNN for phase reconstruction is not an easy task because of its periodic nature and sensitivity to the shift of a waveform. To overcome this problem, we propose a DNN-based two-stage phase reconstruction. In the proposed method, phase derivatives are estimated by DNNs instead of phase itself, which allows us to avoid the sensitivity problem. Then, phase is recursively estimated based on its derivatives, which is named recurrent phase unwrapping (RPU). The experimental results confirm that the proposed method outperformed the direct phase estimation by a DNN.
Original
Amplitude (zero phase)
Direct phase estimation [3]
Instantaneous freq. integration [4]
Proposed method [5]
Original
Amplitude (zero phase)
Direct phase estimation [3]
Instantaneous freq. integration [4]
Proposed method [5]
[1] R. Sonobe and S. Takamichi, “JSUT corpus: free largescale Japanese speech corpus for end-to-end speech synthesis,” arXiv:1711.00354, 2017.
[2] D. Griffin and J. Lim, "Signal estimation from modified short-time Fourier transform," IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 2, pp. 236--243, Apr. 1984.
[3] S. Takamichi, Y. Saito, N. Takamune, D. Kitamura, and H.Saruwatari, “Phase reconstruction from amplitude spectrograms based on von–Mises-distribution deep neural network,” in Int. Workshop Acoust. Signal Enhance. (IWAENC), Sept. 2018, pp. 286--290.
[4] J. Engel, K. K. Agrawal, S. Chen, I. Gulrajani, C. Donahue, and A. Roberts, “GANSynth: Adversarial neural audio synthesis,” in Int. Conf. Learn. Represent. (ICLR), 2019.
[5] Y. Masuyama, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, “Phase reconstruction based on recurrent phase unwrapping with deep neural networks,” (Submitted to ICASSP2020).