On the time-frequency(TF) resolution of the STFT

Post date: Mar 15, 2017 6:17:32 PM

STFT is parameterized by four knobs : block-length, overlapping-length, FFT length, and window type.

The time-frequency (TF)-resolution of STFT can be discussed in two different contexts:

+ The across-frame TF resolution, which involves the block-length and increment/overlapping-length parameters of the STFT. Higher resolution in both time and frequency can be achieved by increasing the overlapping percentage and block length[1], respectively, at the expense of more computation.

FFT length is usually set equal to the block length, implying no zero padding/interpolation. Increasing the FFT length above this (i.e. frequency interpolation) only helps with the interpretation (usually by human) of the representation, but does not change the reconstruction.

+ The per-frame TF resolution, which involves the type of windowing function in the STFT [2]. For a given window, its TF resolution is dictated by the uncertainty principle/properties of the Fourier transform [3], and quantified by the product of the dispersion around 0 in both domains, i.e.

where is the Fourier transform of a window (e.g. sinc <=> rect, delta <=> 1). The optimal TF resolution (lower bound on the product) can be achieved by a Gaussian window, in which the STFT is known as the Gabor transform [4].

However, this does not mean that Gaussian window should always be chosen. The correct choice of window depends on the application. For instance, the Hann window offers gradual and strong side lobe suppression, making noise floor tracking for (acoustic) event detection smoother.

[1] https://www.spectraplus.com/DT_help/overlap_percentage.htm

[2] https://en.wikipedia.org/wiki/Window_function

[3] https://en.wikipedia.org/wiki/Fourier_transform#Uncertainty_principle

[4] https://en.wikipedia.org/wiki/Short-time_Fourier_transform#Resolution_issues