SSE / AVX FIR VST PC Channel divider (crossover), delay, EQ, DRC

first, read these article.

Sandy Bridge CPU has much power, I could run straight FIR for 3 way crossover.
I'm using Core i7-2630QM.

3 way or more channel divider (0-960, 960-3200, 3200-), by 2048 Taps FIR filter at 44.1kHz
just requires up-to-date CPU.

System Requirement for AVX VST
  • Windows 7 SP1 or later
  • Intel SandyBridge CPU, Core i3 or more
  • Microsoft C++ Redistributable package 2010
=======================================

SSE version: VST_SSEFIR
11/13/2011 Modified frequency control

(1) Source code: attached, VST_SSEFIR.cpp.txt
 I can not post full source code because of Steinberg. Please get VisualStudio 2010 C++ express / Steinberg VST SDK to build.
(2) VST DLL, attached. 32bit, SSE2 required, maybe any SandyBridge over 2GHz will work.

you can see straight FIR convolution around line 550.
Straight FIR Calculation

//for (int i = 0; i < sampleFrames; i++) {

// dOut = 0.0f;

// for (int j = 0; j < DEFTAPS; j++) {

// dOut += vstData.FIRCoeff[j] * h_iData_L[i + j];

// }

// outL[i] = dOut;

//}


I'm unrolling inner loop.

for (int i = 0; i < sampleFrames; i++) {
dOut = 0.0;
for (int j = 0; j < DEFTAPS;j = j + 4) {
dOut1 = vstData.FIRCoeff[j + 0] * h_idata_L[i + j + 0];
dOut2 = vstData.FIRCoeff[j + 1] * h_idata_L[i + j + 1];
dOut3 = vstData.FIRCoeff[j + 2] * h_idata_L[i + j + 2];
dOut4 = vstData.FIRCoeff[j + 3] * h_idata_L[i + j + 3];
dOut += dOut1 + dOut2 + dOut3 + dOut4;
}
outL[i] = dOut;
}

I specified /arch:SSE2. the code was compiled like below,

; 548  : dOut3 = vstData.FIRCoeff[j + 2] * h_idata_R[i + j + 2];

; 549  : dOut4 = vstData.FIRCoeff[j + 3] * h_idata_R[i + j + 3];

; 550  : dOut += dOut1 + dOut2 + dOut3 + dOut4;


addss xmm2, xmm3

movss xmm3, DWORD PTR [eax]

mulss xmm3, DWORD PTR [edx]

addss xmm2, xmm3

movss xmm3, DWORD PTR [edx+4]

mulss xmm3, DWORD PTR [eax+4]

add esi, 16 ; 00000010H

add edx, 16 ; 00000010H

add eax, 16 ; 00000010H

dec ebx

addss xmm2, xmm3

addss xmm0, xmm2

jne SHORT $LL6@processRep


SSE2 enabled, xmm is 128bit registers.

Playing
One (stereo) track is using 2.2% of CPU power.  no problem for 3Way or 4Way.

I remember that 2 or 3 years ago, straight FIR filter was impossible task for CPU.

=======================================

AVX version: VST_AVXFIR
 This version was compiled by Visual Studio 2010 Professional C++, with "/arch:AVX" option.
! This VST will crush if your CPU / OS does not support AVX !

This VST takes 2 parameters, Frequency Low Cut / High Cut. for LPF, leave FreqLo = 0, HPF, FreqHi = 44100.


 See detail of AVX programming in the code.

; 611  : l_sum0   = _mm256_add_ps(l_sum0, l_multi);

; 612  : l_in    = _mm256_load_ps((float*)&h_idata_L1[i + j]);

; 613  : l_multi = _mm256_mul_ps(l_in, l_coeff);

; 614  : l_sum1   = _mm256_add_ps(l_sum1, l_multi);

; 615  : l_in    = _mm256_load_ps((float*)&h_idata_L2[i + j]);

; 616  : l_multi = _mm256_mul_ps(l_in, l_coeff);


mov edx, DWORD PTR tv3903[esp+224]

vaddps ymm2, ymm7, ymm2

vmulps ymm7, ymm0, YMMWORD PTR [eax]

vaddps ymm3, ymm7, ymm3

vmulps ymm7, ymm0, YMMWORD PTR [edx+eax]


ymm : 256bit register, contains 8 of float.
vmulps: multiply 8 floats x 8 floats = 8 floats at one time.
vaddps: add 8 floats + 8 floats = 8 floats at one time.

The reason why the code is complicated:
(1) AVX requires 32 bytes aligned memory access.
(2) FIR calculation is one by one stepping.
So I prepared "Offset" input stream, type 0 to type 7, then calculating.


Now one track is using 0.9% of CPU power.

=======================================

OpenMP?
 I also tried OpenMP, but it was too slow. Starting new thread takes much overhead for short cycle VST processing.

=======================================

Delay processing: VST_KoonDelay

This is very simple delay processor. It just process delay on the buffer, never changes signal.

void MySimpleDelay::processReplacing (float** inputs, float** outputs, VstInt32 sampleFrames)
{
float* inL  =  inputs[0];
float* inR  =  inputs[1];
float* outL = outputs[0];
float* outR = outputs[1];

//do nothing
//for (int i = 0; i < sampleFrames; i++) {
// outL[i] = inL[i];
// outR[i] = inR[i];
//}

//add input to input buffer
for (int i = 0; i < sampleFrames; i++){
h_idata_L[Samples + i] = inL[i];
h_idata_R[Samples + i] = inR[i];
}
for (int i = 0; i < sampleFrames; i++) {
outL[i] = h_idata_L[i];
outR[i] = h_idata_R[i];
}

//slide input buffer
float temp_f = 0.0f;
for (int i = 0; i < sampleFrames; i++) {
h_idata_L[i] = h_idata_L[i + sampleFrames];;
h_idata_R[i] = h_idata_R[i + sampleFrames];
}

}


When parameter "Samples" = 7, this routine will delay the signal for 7 PCM samples = 340m/sec * 7 / 44100 = 53.9mm.

AVX FIR based EQ : VST_AVXFIREQ


This is 32 band Simple EQ. You don't have to worry about phase modification.

AVX FIR Based DRC : VST_AVXFIRDRC


This parameter is just a dummy, change this parameter to re-load Frequency-gain file.

Index,Freq,Resp_dB
0,21.00,0.00000
1,43.00,0.00000
2,64.00,0.00000
3,86.00,0.00000
4,107.00,0.00000
5,129.00,-13.5678
6,150.00,0.00000
7,172.00,0.00000
8,193.00,0.00000
9,215.00,0.00000
10,236.00,-20.00000
11,258.00,-40.00000
12,279.00,-60.00000
13,301.00,-80.00000
14,322.00,-80.00000
15,344.00,-80.00000
16,366.00,-80.00000
17,387.00,-80.00000
18,409.00,-80.00000
19,430.00,-80.00000

You can specify Response Gain in dB, for each FIR frequency. Place this file at C:\VST\FreqRes_Input.txt.
FIR filter can process around 20dB/step.

=============================
Images

(1) running 4 way crossover on PC

Fan Less PC. Core i3-2120T, Scythe Ninja3 cooler, picoPSU 150 XT, MSI H61 ITX


(2) exaU2I USB to I2S x4 interface, and 4 way full digital amplifier

(3) 3 way speaker (LINN Komponent 104) + modified BIC DV84 (network removed) as woofer

=============================
don't have audio interface?

(a) to try PC Channel Divider (crossover), you only need 7.1 analog out audio interface.

(b) If you have HDMI audio output and AV amplifier, you can try HDMI digital connection.

Configure your AV amp to accept 7.1 channel from HDMI.

Wrap HDMI audio by ASIO4ALL, and try to configure VST host.
(I don't have HDMI output and AV amplifier now, so I can not show how to.)


and you can use Front, Surround, Surround back output for Low, Mid, High output.

! Try carefully !
use single full-range speaker to listen each channel, to confirm Low / Mid / High output is configured correctly.

=============================

How to compile the source code
you need Steinberg's VST SDK (2.4), VisualStudio 2010 Professional C++, and some knowledge.
1st, try to build sample VST by yourself. then replace source code to attached, and adjust project setting.

I'm using C:\VST folder to place these VST. for safety I recommend you will place them on same folder C:\VST as me.

Č
ċ
FreqRes_Input.txt
(21k)
koon 3876,
Oct 15, 2011, 10:00 AM
ċ
VST_AVXFIR.cpp.txt
(24k)
koon 3876,
Oct 8, 2011, 1:12 PM
ċ
koon 3876,
Oct 8, 2011, 1:12 PM
ċ
VST_AVXFIRDRC.cpp.txt
(21k)
koon 3876,
Oct 15, 2011, 9:44 AM
ċ
koon 3876,
Oct 15, 2011, 9:44 AM
ċ
VST_AVXFIREQ.cpp.txt
(25k)
koon 3876,
Oct 15, 2011, 9:37 AM
ċ
koon 3876,
Oct 15, 2011, 9:37 AM
ċ
VST_KoonDelay.cpp.txt
(5k)
koon 3876,
Oct 15, 2011, 9:32 AM
ċ
koon 3876,
Oct 15, 2011, 9:32 AM
ċ
koon 3876,
Oct 1, 2011, 7:36 PM
ċ
VST_SSEFIR.cpp.txt
(16k)
koon 3876,
Nov 13, 2011, 2:17 PM
ċ
koon 3876,
Nov 13, 2011, 2:17 PM