reentrancy, thread safe, asynchronous signal safe, interrupt safe and cancellation safe

let's define reentrancy first.

Reentrancy referring to a piece of code that can be re-entered by the same process ( may, or may not be the same thread ), before its current invocation completes, without any side effect to the previous invocation.

Another definition of reentrancy is:

An operation is reentrant if it can be safely called again before its previous invocation has been completed by the same process (i.e it can be safely executed concurrently).

For a bit lengthy alternative definition, please refer link:

http://en.wikipedia.org/wiki/Reentrancy_%28computing%29

All the definitions suggests that: same code run concurrently, and should not have impact each other in terms of "side effects".

so what is "side effect"?

A precise definition of “side effect” is a rather large topic. Here is a quick quote from C99 section 5.1.2.3:

  • side effect [ISO/IEC 9899:2011]
  • Changes in the state of the execution environment achieved by accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations.

To make it short, “without any side effect” in our context can be defined as “as if the second invocation in the middle of the first invocation did not happen”. What this definition entails is that the second invocation and the first invocation should not have any “business” to do with each other at all.

Let’s have a quick example:

int a;

void not_reentrant ( void ) {

a = 0; step A1, we use B1 for second invocation

a++; step A2, we use B2 for second invocation

fprintf(stdout, "a is %d", a); step A3, we use B3 for second invocation

}

Now if we call the above API, and somehow we invoke the same code between “step A1” and “step A2”, then the output will be “a is 2”, instead of “a is 1”. That is a side effect. The second invocation changed the behavior of the first invocation therefore code is not reentrant.

Reentrance requires that the code can be executed in any of the following order, without any side effect to each other:

A1, A2, B1, B2, A3, B3

A1, B1, A2, A3, B2, B3

From the reentrancy definition above, there are two things needs to be highlighted.

1. Two invocations happen in parallel. How could this happen?

2. How the two invocations relates to each other?

Answers to the above questions will lead to different principles to write reentrant code.

next we will try to answer the above questions in different scenarios.

1. reentrancy with respect to thread

If a piece of code can be safely reentered again by a different thread, it is reentrant with respect to thread.

Reentrancy is often related to thread-safe. A quick definition of thread-safe is: “a function whose side effects, when called by two or more threads, is guaranteed to be as if the threads each executed the function one after another in an undefined order, even if the actual execution is interleaved" (ISO/IEC 9945:1-1996, §2.2.2).

Consider the following example:

int a;

void not_reentrant ( void ) {

a = 0; step A1, we will use B1 for second invocation

a++; step A2, we use B2 for second invocation

fprintf(stdout, "a is %d", a); step A3, we use B3 for second invocation

}

If thread "A" run the above code and got context switched out after step A2, and another thread "B" started calling above code and complete the above and return the cpu back to thread "A", thread "A" would have a wrong value for “a” other than it expects.

Thread "B" can cut in due to the following reasons:

1. on a single processor or multi-process system, thread "B" can have a higher priority if scheduling is priority based, or thread "A" simply run out of its time quota after step A1 if scheduling is round robin.

2. on a multi-process system, thread "A" and thread "B" can just happen to run on two different processors at the same time and happen to have the above code sequence.

To make the above code thread-safe, we can add a mutex to protect the global variable:

int a;

void i_am_thread_safe_but_not_reentrant ( void ) {

pthread_mutex_lock( &lock_for_a ); // only one thread can go beyond this step at a time

a= 0; step A1 for thread "A", B1 for thread "B"

a++; step A2 for thread "A", B2 for thread "B"

fprintf(stdout, “a is %d”, a); step A3 for thread "A", B3 for thread "B"

pthread_mutex_unlock( &lock_for_a );

}

Above code is thread safe, as any thread will run code A1, A2, A3 in an atomic way without any other thread can run the same code.

The question is: is above code re-entrant. The answer is no. When CPU is running at step A1, it cannot possibly re-enter the same API and running code A1 as well because the mutex was taken already.

"A1, A2 and A3" collectively called "critical section". Only one thread can enter critical section at a time. This "only one thread can enter at a time" is clearly against the definition of reentrancy: "same code can be invoked twice at the same time".

In general, reentrancy requires that code execution by different threads can interleave in any combination whereas thread safe only requires the code executed by one thread is complete in serial in a block fashion.

The reentrancy requires that A1, A2, A3 ,and B1, B2, B3 can be run in any interleave order:

A1, B1, A2, A3, B2, B3,

A1, B1, B2, A2, A3, B3

B1, B2, A1, A2, A3, B3

B1, B2, A1, B3, A2, A3

The thread safe requires that A1, A2, A3 execute as a block atomically with respect to B1, B2 and B3 as a block. It is either: A1, A2, A3, B1, B2, B3, or B1, B2, B3, A1, A2, A3, only two possible orders. One invocation must finish before the other invocation can continue and finish.

Therefore reentrancy is a much stronger proposition than thread-safe. It is possible to carefully cook up an reentrant function that is not thread safe, however, in general, a reentrant function is also thread-safe, but not the other way round.

Bottom line: mutex is often used for critical section to make an API thread-safe. A function that uses mutex is never re-entrant. In general, a reentrant operation is thread-safe but not vice versa.

Following are some general rules to write thread safe code, with comments with respect to reentrancy:

using re-entrant operation ====> reentrant automatically

using mutual exclusion ====> not reentrant

using thread-local storage ====> can be reentrant

using atomic operations ====> can be reentrant

using immutable objects ====> can be reentrant

2. reentrancy with respect to recursion

A recursive function directly calls itself. Therefore, by definition, a recursive function “should be” a reentrant function because the same code got “re-entered”.

Consider the following example:

int a;

void not_reentrant ( void ) {

a++;

if ( a < 5 ) {

not_reentrant();

}

fprintf(stdout, “a is %d”, a);

}

Above is a carefully designed recursive function. The expected result is to print “a” 5 times with each time value increased. It is reentrant since the same code got re-entered by itself.

Now consider the above code run in different threads. They clearly will have side effect against each other. So above code is not thread-safe.

Note that we are discussing reentrancy with respect to recursion. So multi-threading is not applicable here. The above code is reentrant within the context of recursion, but not reentrant within the context of multi-threading.

Bottom line: A recursive function is reentrant only in the context of recursion. A recursive function that is written in multi-threaded environment should be thread-safe too.

3. Reentrancy with respect to signal

If a piece of code can be safely reentered again inside a signal handler, it is reentrant with respect to signal.

Above definition is often conceptually related to “asynchronous-signal safe”, or “asynchronous safe”. Following is the definition of “asynchronous signal safe” by GNU:

asynchronous-safe function [GNU Pth]

A function is asynchronous-safe, or asynchronous-signal safe, if it can be called safely and without side effects , without interfering other operations, from within a signal handler context. That is, it must be able to be interrupted at any point to run linearly out of sequence without causing an inconsistent state. It must also function properly when global data might itself be in an inconsistent state.

C99 Rationale V5.10 rational explained how a signal can be handled and how it is related to reentrancy in a very concise way: ( page 141, section 7.14.1.1 ):

When a signal occurs, the normal flow of control of a program is interrupted. If a signal occurs that is being trapped by a signal handler, that handler is invoked. When it is finished, execution continues at the point at which the signal occurred. This arrangement could cause problems if the signal handler invokes a library function that was being executed at the time of the signal. Since library functions are not guaranteed to be reentrant, they should not be called from a signal handler that returns.

In short, signal safe problem is in essence a reentrancy problem. In order to be asynchronous signal safe, a function must be reentrant.

Consider again the following example:

int a;

void i_am_not_signal_safe ( void ) {

a= 0; step A1 for thread "A", B1 for running inside a signal handler

a++; step A2 for thread "A", B2 for running inside a signal handler

fprintf(stdout, “a is %d”, a); step A3 for thread "A", B3 for running inside a signal handler

}

Consider the following two scenarios:

Scenario 1:

Thread "A" running at A2, and contest switched out to Thread "B", which got interrupted by a signal. Signal handler will run inside the context of Thread "B", and inside signal handler, we call the above API.

Scenario 2:

Thread "A" running at A2, and got interrupted by a signal, Signal handler will run inside the context of Thread "A". and inside signal handler, we call the above API again.

In scenario 1, since the same code runs inside two threads, the order of the code can be any order, e.g.

A1, B1, A2, A3, B2, B3,

A1, B1, B2, A2, A3, B3

B1, B2, A1, A2, A3, B3

B1, B2, A1, B3, A2, A3

Above execution order is also the strongest proposition for reentrancy.

In scenario 2, since the same code runs inside the same thread, signal handler must finish first before thread "A" can continue from where it interrupted. The order of the execution would be:

A1, A2, (B1, B2, B3 inside signal handler ), A3

Above execution order is very similar as if the API itself is recursive.

From the above analysis, one can tell that in order to be reentrant with respect to signal, the code has to be absolutely no mutex, no global variable, etc.

By the way, there are synchronous signals. Synchronous signals are signals that are caused by the thread running abnormally and the signal will be delivered to the same thread right away without any delay. SIGILL, SIGFPE, and SIGSEGV.

Bottom line: Reentrancy with respect to signal is a much stronger proposition than reentrancy with respect to thread-safe. In fact, it is the strongest among all aspect of reentrancy.

4. reentrancy with respect to interrupt

As signal is a software interrupt, the discussion here is very similar to reentrance with respect to signal. However, we should outline the difference between an interupt and a signal first.

1. both signal and interupt can be nested. An ISR (interrupt service routine ) can run from inside ISR, a signal handler can run from inside a signal handler. an ISR can even run from within a signal handler.

2. a signal handler runs within the context of the thread that got the signal delivered to it. Signal handler becomes part of the the call stack of the thread and could use the stack space of the thread ( though POSIX allow dedicated signal stack also ).

3. a signal handler is a schedulable entity. kernel can choose to context switch out a signal handler and run some other code and then context switch back to the same signal handler to continue run from where it context switched out.

3. an ISR is not a schedulable entity, and does not run within the context of any threads. An ISR will run to completion before the processor can be scheduled to some other use code.

Consider again the following example:

int a;

void i_am_not_signal_safe ( void ) {

a= 0; step A1 for thread "A", B1 for running inside an ISR

a++; step A2 for thread "A", B2 for running inside an ISR

fprintf(stdout, “a is %d”, a); step A3 for thread "A", B3 for running inside an ISR

}

Consider the following two scenarios:

Scenario 1:

Thread "A" running at A2 on one processor, and Thread "B" is running on another processor, which got interrupted by an interrupt. ISR (interrupt service routine) will run on the processor that is running thread "B" but not under the context of Thread "B", and inside interrupt service routine, we call the above API.

Because there are two processor involved, the sequence of A1, A2, A3 and B1, B2, B3 can be any order. This scenario is identical to the earlier scenario for signal.

Scenario 2:

Thread "A" running at A2, and got interrupted by an interrrupt, ISR will run on the same processor but not inside the context of Thread "A". and inside ISR, we call the above API again.

The code running sequence here is A1, A2, B1, B2, B3, A3. ( note B1, B2, B3 as part of ISR, run to completion without the possibility of context switched out ). This scenario is identical to the earlier scenario for signal too.

Above two scenarios are identical to the "reentrancy with respect to signal", therefore we can conclude that "reentrancy with respect to interrupt" is identical to "reentrancy with respect to signal".

A quick way to make a piece of code signal/interrupt safe is to shutdown signal/interrupt inside the code before code completion. This is not really a practical approach.

Bottom line: on multi-processor systems, "reentrancy with respect to interrupt" is the same as “reentrancy with respect to signal”.

5. reentrancy with respect to thread cancellation

Thread cancellation means a thread ceased existence.

There are two type of cancellation: asynchronous cancellation and deferred cancellation. The former means a thread can be cancelled at any point of running as soon as the cancellation request comes in. The latter means a thread can only be cancelled at certain point of running ( those are called cancellation points where the code will check if there is any cancellation request pending, and if yes, acting upon the request ), the request will remain pending if code is running at none cancellation point

Image that a thread locks a mutex, and got cancelled asynchronously, the mutex will be left in the locked state forever.

Image that a thread allocated some memory from heap, and got cancelled before it can free the memory, the memory will be leaked forever.

pthread_setcancelstate(3) has a very concise yet clear description on the problems with cancelling a thread.

So what is a cancellation safe function?

POSIX.1 defies a cancellation safe function as

An async-cancel-safe function is one that can be safely called in an application where asynchronous

cancellability is enabled

Compared with signal safe, or thread safe, cancellation safe is such a stronger proposition that POSIX.1 only requires three APIs to be cancellation safe:

pthread_cancel()

pthread_setcancelstate()

pthread_setcanceltype()

how is cancellation safe related to the reentrancy? Well, the proposition of cancellation safe is so strong that any API that is cancellation safe is automatically reentrant with respect to recursion, signal and thread !

If an operation can be stopped in the middle and the rest of the operation get cancelled without any side effect, then the same operation can be naturally reentered without side effect either.

6. how to write reentrant code ( with respect to recursion, signal and thread )?

Following is list of things that will make code none reentrant.

1. using mutex, or spinlock, or condvar, or semaphore. Those primitives are meant to be shared among threads or processes hence render them to be global variables or shared memory variables.

2. static memory allocation via malloc()/free() etc. Malloc() family always uses mutex inside, no matter what implementation.

3. returning static memory pointer.

4. holding static data over different invocations.

5. reading or writing global variables or data structure, with, or without mutex protection, with, or without lock free atomicity.

6. maintaining state information between different invocations.

7. self modifying code.

8. calling other none reentrant code.

9. changing errno. Although errno is per thread variable, signal handler can modify it.

10. reading and writing software resources ( global variable, etc ) shared among threads.

11. accessing hardware resources, including I/O devices.