Scalable RWLock

Scalable RWLock 4.1

Authors: Amine Moulay Ramdane.

Description:

A fast, and scalable and lightweight Multiple-Readers Exclusive-Writer Lock that is portable called LW_RWLock that works accross processes and threads and a fast, and scalable and starvation-free and lightweight Multiple-Readers-Exclusive-Writer Lock that is portable called LW_RWLockX and that works across processes and threads and that is FIFO fair on the writer side and FIFO fair on the reader side and that is of course Starvation-free, and also a fast a scalable Multiple-Readers-Exclusive-Writer Lock that is portable called RWLock and that doesn't spin-wait but uses my SemaMonitor so it is energy efficient and also a fast and scalable and lightweight Multiple-Readers-Exclusive-Writer Lock that is portable called RWLockX, this one is FIFO fair on the writer side and FIFO fair on the reader side and of course it is starvation-free and it doesn't spin-wait but uses my portable SemaMonitor and portable event objects , so it is energy efficient, and also LW_Asym_RWLockX that is a lightweight scalable Asymmetric Reader-Writer Mutex that uses a technic that looks like Seqlock without looping on the reader side like Seqlock, and this has permited the reader side to be costless, it is FIFO fair on the writer side and FIFO fair on the reader side and it is of course Starvation-free and it does spin-wait, and also Asym_RWLockX a lightweight scalable Asymmetric Reader-Writer Mutex that uses a technic that looks like Seqlock without looping on the reader side like Seqlock, and this has permited the reader side to be costless, it is FIFO fair on the writer side and FIFO fair on the reader side and it is of course Starvation-free and it does not spin-wait, but waits on my SemaMonitor, so it is energy efficient.

Asym_RWLockX and LW_Asym_RWLockx are limited to 400 threads but you can manually extended the maximum number of threads by setting the NbrThreads parameter of the constructor, and you have to start once and for all your threads and work with all your threads, don't start every time a thread and exit from the thread. Asym_RWLockX and LW_Asym_RWLockx don't use any atomic operations and/or StoreLoad style memory barriers on the reader side, so it's scalable and very fast.

As you have noticed i have invented 6 variants of my scalable RWLock, the ones that ends with an X in there names are FIFO fair on the writer side and FIFO fair on the reader side and they are of course starvation-free and they are suitable for reatime critical systems, the others are not.

Why have i decided to come with scalable and starvation-free RWLocks? cause you can have frequent reads and infrequent writes but from time to time you can have frequent writes, so i think it is important to have a scalable and starvation-free RWLock , this is why i have come up with two variants of scalable and starvation-free RWLocks.

My lightweight variant and starvation-free RWLock called LW_RWLockX and also my LW_RWLOCK both scales even at 3% of writes, that's also very interresting to know...

And you have to know also that even if we are in a scenario with many more writes than 3% of writes and my scalable LW_RWLockX don't scale globally in the timing you have to know that LW_RWLockX will still be useful , cause even if it doen't scale globally in the timing , you have to understand that if from the time t1 to the time t2 there is writes and reads and from the time t2 to time t3 there is only reads, so even if the time from t1 to t2 will add more to the overall time cause there is writes threads, the perceived throughput from time t2 to t3 will be higher and the waiting time from t2 to t3 will be lower cause all my RWLocks will be scalable from t2 to t3 and this will make all my scalable RWLocks useful cause for example the database clients from internet or intranet waiting from t2 to t3 will be served more quickly and this is still useful and this will make my scalable RWLocks still useful even if it doesn't scale above 3% of writes.

I have read the following paper about RCU (Read-Copy Update), and as you will notice they are testing this RCU implementation against the pthread reader-writer lock, and as you have noticed the pthread reader-writer lock doesn't scale well cause the reader side of the pthread reader-writer lock is expensive...

Here is the paper:

https://www.efficios.com/pub/rcu/urcu-main.pdf

But as you will notice that the quiescent-state based reclamation (QSBR) and RCU scale very well cause there reader side functions scale very well, but don't worry , you don't need the RCU, cause my scalable RWLocks are also scaling very well on read-mostly scenarios, why my scalable RWLocks are scaling very well ? Cause read the following about my LW_RWLock algorithm:

"Notice carefully that my RWLock is scalable cause each element of the FCount1^ array resides in a seperate cache line , hence when i am incrementing with LockedExchangeAdd(FCount1^[myid].fcount1,1) it's scaling, notice also that i am using the following: myid:=GetCurrentProcessorNumber so i am puting the processor number inside myid variable."

Read this:

http://pages.videotron.com/aminer/rwlock1.html

So as you have noticed in my algorithm, the threads that have the same "myid" will have and will increment the "FCount1^[myid].fcount1" in the same local cache, so there will be no cache-lines transfers between the cores on the reader side of my scalable RWlocks algorithms, this is why my RWLock algorithms are scaling very well on read-mostly scenarios.

Here is my new invention that is my new algorithm:

I have invented a new algorithm of my scalable Asymmetric Distributed Reader-Writer Mutex, and this one is costless on the reader side, this one doesn't use any atomic operations and/or StoreLoad style memory barriers on the reader side, my new algorithm has added a technic that looks like Seqlock, but this technic doesn't loop as Seqlock. Here is my algorithm:

On the reader side we have this:

--

procedure TRWLOCK.RLock(var myid:integer);

var myid1:integer;

id:long;

begin

myid1:=0;

id:=FCount5^.fcount5;

if (id mod 2)=0

then FCount1^[myid1].fcount1:=1

else FCount1^[myid1].fcount1:=2;

if ((FCount3^.fcount3=0) and (id=FCount5^.fcount5) and (FCount1^[myid1].fcount1=1))

then

else

begin

LockedExchangeAdd(nbr^.nbr,1);

if FCount1^[myid1].fcount1=2

then LockedExchangeAdd(FCount1^[myid1].fcount1,-2)

else if FCount1^[myid1].fcount1=1

then LockedExchangeAdd(FCount1^[myid1].fcount1,-1);

event2.wait;

LockedExchangeAdd(FCount1^[myid1].fcount1,1);

LockedExchangeAdd(nbr^.nbr,-1);

end;

end;

--

The writer side will increment FCount5^.fcount5 like does

a Seqlock, and the reader side will grap a copy of FCount5^.fcount5

and copy it on the id variable, if (id modula 2) is equal to zero

that means the writer side has not modified yet Fcount3^.fcount3,

and the reader side will test again if FCount3^.fcount3 equal 0, and

if id=FCount5^.fcount5 didn't change and if FCount1^[myid1].fcount1 that we have assigned before didn't change and that means that we are sure that the writer side will block on FCount1^[myid1].fcount1 equal 1.

And notice with me that i am not looping like in Seqlock.

And the rest of my algorithm is easy to understand.

This technic that looks like Seqlock without looping like Seqlock

will allow us to be sure that although the x86 architecture will reorder the loads of the inside reader critical section , the loads

inside the reader critical section will not go beyond the load of FCount5^.fcount5 and this will allow my algorithm to work correctly.

My algorithm is FIFO fair on the writer side and FIFO fair on the

Reader side , and of course it is Starvation-free, and it is

suitable for realtime critical systems.

My Asym_RWLockX and LW_Asym_RWLockX algorithms work the same.

You will find the source code of my new algorithm here:

https://sites.google.com/site/aminer68/scalable-distributed-reader-writer-mutex

It is the version 2 that is my own algorithm.

and you can download the source code of my Asym_RWLockX and LW_Asym_RWLockX algorithms that work the same from here:

https://sites.google.com/site/aminer68/scalable-rwlock

So hope that you will be happy with my RWLocks algorithms...

A Read/Write Lock is a performance improvement over a standard mutex for cases where reads outnumber writes. with a Read/Write Lock multiple simultaneous read locks may be held, but write locks are exclusively held.

The exclusive writing lock ensures that race conditions do not occur, since if one client is writing to the data no other client may read or write.

Also, the allowance for multiple simultaneous read locks decreases resource contention since multiple readers can safely use the shared data.

Click here to see the explanation of my scalable LW_RWLock algorithm: Algorithm explanation

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux...

Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+ -DDelphi

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems

Please click on the small arrow on the right of the zip file bellow to download...