An efficient Threadpool engine that scales very well

An efficient Threadpool engine that scales very well version 3.2

Author: Amine Moulay Ramdane

Description:

An efficient Thread Pool Engine that scales very well.

I have updated my efficient Threadpool engine with priorities and

my Threadpool engine to version 3.2, i have come up with a new algorithm that is more optimized and that scales very well.

The following have been added to my efficient Threadpool engine:

- I have used scalable counting networks to make my Threadpool engine scales very well.

- The worker threads enter in a wait state when there is no job in the concurrent FIFO queues - for more efficiency -

- You can distribute your jobs to the worker threads and call any method with the threadpool's execute() method.

- It uses work-stealing to be more efficient.

- You can configure it to use stacks or FIFO queues , when you use stacks it will be cache efficient.

- Now it can use processor groups on windows, so that it can use more than 64 logical processors and it scales well.

- Now it distributes the jobs on multiple FIFO queues or stacks so that it scales well.

- You can wait for the jobs to finish with the wait() method.

- It's NUMA-aware and NUMA efficient.

- And it is portable to many operating systems.

Please read the HTML tutorial inside the zip.

More precision about my efficient Threadpool that scales very well, my Threadpool is much more scalable than the one of Microsoft, in the workers side i am using scalable counting networks to distribute on the many queues or stacks, so it is scalable on the workers side, on the consumers side i am also using lock striping to be able to scale very well, so it is scalable on those parts, on the other part that is work stealing, i am using scalable counting networks, so globally it scales very well, and since work stealing is "rare" so i think that my efficient Threadpool that scales very well is really powerful, and it is much more optimized and the scalable counting networks eliminate false sharing, and it works with Windows and Linux.

You have to know that to enlarge the stack of the worker threads of the Threadpool that use TThread, you have to set the stack size for the executable.

Look into defines.inc there is many options:

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems

Look at test.pas demo inside the zip file...

Language: FPC Pascal v2.2.0+ / Delphi 5+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DDelphi -DMSWINDOWS -$H+

For Delphi XE-XE7 use the -DXE switch

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems

Note: testpool.pas is a parallel program of a Matrix multiply by a vector that uses SSE+ and it requires Delphi 5+. test.pas and test_thread.pas works with both FreePascal and Delphi.

Please click on the small arrow on the right of the zip file bellow to download...