Revisiting phase-based attributes for representing speech sounds

河原 英紀 名誉教授 / Emeritus Prof. Hideki Kawahara)

2019/05/23 東京大学 本郷キャンパス 工学部6号館3F セミナーA/D

概要 & 講演資料 / Abstract & slides

We want to discuss a possible revival of parametric VOCODER based on several new representations of phase-based attributes and an excitation source model called frequency domain velvet noise (FVN). Introduction of WaveNet and following end-to-end approach made VOCODER obsolete. Many articles attribute VOCODERs' poor quality to model mismatch, specifically to minimum phase response and errors in voicing detection. This talk discusses to reformulate VOCODER instead of discarding it out-of-date and introduces a personal view on the possible roles of VOCODER parameters in future end-to-end systems. It starts from a brief review of the perception of phase-based attributes and voice production followed by real-time and straightforward procedures for calculating fundamental frequency and events based on instantaneous frequency and group delay. These procedures use an analytic signal with the six-term cosine series envelope function designed for anti-aliasing glottal source models in closed-form. This envelope function also plays an essential role in implementing FVN. The talk consists of demonstrations using interactive (and some are real-time) tools implemented using MATLAB. Note: The MATLAB codes and scripts are available in a GitHub repository (https://github.com/HidekiKawahara/YANGstraight_source).

kawaharaPhaser.pdf