Formant Estimation using LPC method
The Linear Predictive Coding (LPC) method is a way to analyze and model the spectral envelope of a speech signal. The formants of a speech signal are the frequency regions of maximum energy, and they are closely related to the resonant frequencies of the vocal tract.
First, the speech signal is windowed and the LPC coefficients are estimated. This is usually done by performing an autocorrelation of the speech signal and then solving the resulting Yule-Walker equations.
Next, the LPC coefficients are used to find the roots of the LPC polynomial. These roots represent the poles of the LPC model, and they are closely related to the formants of the speech signal.
The poles of the LPC model are then converted from the z-plane to the complex frequency plane (i.e., the s-plane). This is done by taking the inverse of each pole, which gives the corresponding frequency and damping factor for each formant.
The formant frequencies are then found by taking the absolute value of the imaginary part of each pole, as the frequencies are the imaginary part of the s-plane
Finally, the formant frequencies are converted from the complex frequency plane to the Hz scale.
Formant Estimation of "sarigamapa" song
Roots of the signal
-0.5793 + 0.3562i
-0.1117 + 0.8102i
0.8231 + 0.3814i
0.9630 + 0.1123i
Formants of the audio signal
815Hz
3046Hz
Activity 8
Formant Estimation with LPC Coefficients
[data, Fs] = audioread('sarigamapa.wav');
data = data(:,1);
segmentlen = 100;
noverlap = 90;
NFFT = 128;
spectrogram(data,segmentlen,noverlap,NFFT,Fs,'yaxis')
title('Signal Spectrogram')
dt = 1/Fs;
I0 = round(0.1/dt);
Iend = round(0.25/dt);
x = data(I0:Iend);
x1 = x.*hamming(length(x));
Apply a pre-emphasis filter. The pre-emphasis filter is a highpass all-pole (AR(1)) filter.
preemph = [1 0.63];
x1 = filter(1,preemph,x1);
A = lpc(x1,8);
rts = roots(A)
real_z1 = real(rts);
imag_z1 = imag(rts);
figure()
scatter(real_z1,imag_z1,100,'black','Marker',"x",'LineWidth',2)
xline(0,'LineWidth',2);
yline(0,'LineWidth',2);
grid on
xlabel('Real Axis')
ylabel('Imaginary Axis')
title('Roots of the LPC')
rts = rts(imag(rts)>=0);
angz = atan2(imag(rts),real(rts));
[frqs,indices] = sort(angz.*(Fs/(2*pi)));
bw = -1/2*(Fs/(2*pi))*log(abs(rts(indices)));
Use the criterion that formant frequencies should be greater than 90 Hz with bandwidths less than 400 Hz to determine the formants.
nn = 1;
for kk = 1:length(frqs)
if (frqs(kk) > 90 && bw(kk) <400)
formants(nn) = frqs(kk);
nn = nn+1;
end
end
disp(formants)
Spectrum
n = length(data); % number of samples
y = fft(data);
f = (0:n-1)*(Fs/n); % frequency range
power = abs(y).^2/n;
plot(f,power,'LineWidth',1.2);
xlabel('Frequency');
ylabel('Power');
grid on
Other methods for estimating formants in speech signals
Cepstral analysis: This method uses the cepstrum, which is the inverse Fourier transform of the logarithm of the power spectrum, to estimate formants. The formant frequencies are then found by taking the roots of the cepstrum.
Autoregressive modelling: This method involves modelling the speech signal as an autoregressive (AR) process, and then finding the roots of the AR polynomial. The formant frequencies are then found by taking the absolute value of the roots of the polynomial.
Harmonic Product Spectrum (HPS): This method estimates the formants by taking the product of the magnitude spectra of the signal at different octave bands. The peaks of this product spectrum correspond to the harmonics of the fundamental frequency of the speech signal.
Mel-Frequency Cepstral Coefficients (MFCCs): This method is often used in speech recognition systems, it is based on the cepstral analysis but it uses the Mel scale to represent the frequency.
The inverse filtering method: This method uses an estimated impulse response of the vocal tract to filter the speech signal. The formants are then found by analyzing the frequency response of the filtered signal.
The formant tracking method: This method involves tracking the formants of a speech signal over time by analyzing the speech signal in small windows and estimating the formants for each window.