on aysnc signal safe and "fork()"

this blog is related to the below blog:

https://sites.google.com/site/embeddedmonologue/home/c-programming/reentrancy-thread-safe-asynchronous-signal-safe-interrupt-safe-and-cancellation-safe

1. Definition of Asynchronous Signal Safe

“asynchronous-signal safe”, or “asynchronous safe”, refers to an API that is safe for a async signal, meaning it can be interrupted by a signal handler and it can be called again within the same signal handler, It essentially a reentrancy problem.

Following recaptures the definition of “asynchronous signal safe” by GNU and C99:

asynchronous-safe function [GNU Pth]

A function is asynchronous-safe, or asynchronous-signal safe, if it can be called safely and without side effects , without interfering other operations, from within a signal handler context. That is, it must be able to be interrupted at any point to run linearly out of sequence without causing an inconsistent state. It must also function properly when global data might itself be in an inconsistent state.

C99 Rationale V5.10 rational explained how a signal can be handled and how it is related to reentrancy in a very concise way: ( page 141, section 7.14.1.1 ):

When a signal occurs, the normal flow of control of a program is interrupted. If a signal occurs that is being trapped by a signal handler, that handler is invoked. When it is finished, execution continues at the point at which the signal occurred. This arrangement could cause problems if the signal handler invokes a library function that was being executed at the time of the signal. Since library functions are not guaranteed to be reentrant, they should not be called from a signal handler that returns.

In short, signal safe problem is in essence a reentrancy problem with respect to signal. An API that is sync signal safe is safe for interrupt and recursion as well. It is also thread safe. In another word, an async signal safe API is reentrant in any circumstances.

There are some characteristics about signal handler:

1. signal can be nested. a signal handler can run from inside a signal handler.

2. a signal handler runs within the context of the thread that got the signal delivered to it. Signal handler becomes part of the the call stack of the thread and could use the stack space of the thread ( though POSIX allow dedicated signal stack also ).

3. a signal handler is a schedulable entity. kernel can choose to context switch out a signal handler and run some other code and then context switch back to the same signal handler to continue run from where it context switched out.

To be reentrant with respect to async signal, an API must use no mutex, no global state/variable, no calling any async unsafe API, etc.

let's see if "fork()" satisfies above requirements.

2. multi-threading and async signal safe

It is very clear now that pthread mutex related APIs will not be async signal safe. In fact, any code that uses mutex is not reentrant.

It is also very clear that any synchronization APIs that using mutex underneath will not be async signal safe. This basically includes pretty much all pthread synchronization APIs.

The above suggest that none of pthread related APIs should be used in a signal handler, or should be used inside an API that is being called by a handler.

Now that if you are using a third party multi-threaded library, how do you know that the library code can be used inside a signal handler or not. There is no easy answer. You have to check the library code to make sure even they are multi-threaded, they are friendly to async signal.

In summary: thread-safe APIs are not friendly to async signal safe, or vice versa. A signal handler should be really as simple and short as possible and does minimal work, much like an interrupt service routine (ISR).

Now let's move to "fork()" to see if it is async signal safe.

3. definition of "fork()"

"fork()" is defined by POSIX as:

http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html

To highlight the part of "fork()" that is related to signal safe

  • A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called. [THR]
[Option Start]
  • Fork handlers may be established by means of the pthread_atfork() function in order to maintain application invariants across fork() calls.
    • When the application calls fork() from a signal handler and any of the fork handlers registered by pthread_atfork() calls a function that is not asynch-signal-safe, the behavior is undefined.

Above is the most controversial part of "fork()" definition. It allows "fork()“ to be used in a multi-threaded environment and we know multi-threaded environment is not async signal friendly. To solve such problem, POSIX suggested two things to do.

First: "the child process may only execute async-signal-safe operations until such time as one of the exec functions is called"

Once parent called "fork()", kernel will prepare a child process. Inside child, it only contains a replica of the parent's calling thread, all other threads will not be existing any more. However, mutexes that were hold be other thread will still show up in child's process space and will still be hold by a thread that does not exist any more in the child process. Those mutexes were in an inconsistent state that child would have difficulty to clean up. Therefore POSIX suggest once child process was born, it should not do any pthread mutex related work, or only "async-signal-safe" work, and call "exec()" as soon as possible to wipe out parent code.

It is typically not a problem for child process to do "exec()" as soon as it was born. However sometimes, child may want to do some bookkeeping before running new binary, so it may call a logging API or a database API, or whatever 3rd party API to suggest a bookkeeping activity. Since we don't know if a 3rd part API is async-singal-safe or not, the child better not do so, instead, it should re-direct any complicated operations back to parents via a "quick and safe way", e.g., a "semaphore" to parents, and parents will take up the bookkeeping work.

Second: " Fork handlers may be established by means of the pthread_atfork() function in order to maintain application invariants across fork() calls"

"invariants" above means to keep state/mutex consistent, cleanup if needed. For mutex, or global states that is not meaningful any more in child process address space, if child will not call "exec" to run a new binary, but instead continue running parent's code, child process must do the cleanup as soon as possible before continue running. POSIX offered a way to do the clean up: to install a "pthread_atfork()" handler. This suggestion from POSIX might be the worst idea ever because it directly make "fork()" async signal unsafe: to make "fork()" signal signal safe, we need to clean up mutex and global state, and inside cleanup handlers, we operate on locks, which essentially render "fork()" not async signal safe if the cleanup is not a straight forward reset.

let's take a look at why:

int pthread_atfork(void (*prepare)(void), void (*parent)(void),
       void (*child)(void));
[Option End]
[Option End]

handler "prepare" will be called right at the beginning of “fork()", handler "child" will be run right after child process was born but before any other code run. Inside the handler, we should do mutex and other global state cleanup.

"pthread_atfork(()" can be called multiple times with different handlers, all the handlers will be chained together. It is difficult to ensure that the 3rd party library can do a good job in terms of cleanup. In another word, allowing "pthread_atfork()" to make it multi-threaded friendly could render "fork()" async signal unsafe.

In the history of GNU "fork()", it has experienced both of the above.

1. it directly tried to lock a lock, which render it async signal unsafe.

2. GNU malloc() has installed a "pthread_atfork()" handler, which tries to lock a mutex.

Refer below for a history of the GNU "fork()":

https://sourceware.org/bugzilla/show_bug.cgi?id=4737

"fork()" has since been kicked out of "list of async signal safe APIs":

http://www.opengroup.org/austin/aardvark/latest/xshbug3.txt

However, reviewing latest GNU "fork()" code (glibc 2.7). the violation of #1 above has since been removed,

nptl/sysdeps/unix/sysv/linux/fork.c

175 /* Reset the file list. These are recursive mutexes. */

176 fresetlockfiles ();

177

178 /* Reset locks in the I/O code. */

179 _IO_list_resetlock ();

180

181 /* Reset the lock the dynamic loader uses to protect its data. */

182 __rtld_lock_initialize (GL(dl_load_lock));

183

However, the violation of #2 above is at the mercy of 3rd party library, whoever installs whatever "pthread_atfork()" handlers. User of "fork()" has no such control and has to be on the constant watching out.

you can still state that "fork()" is async signal safe, and blame 3rd party code violates the "async signal safe" rule, but that is largely not a practical thing to do. It is much more practical to just say "fork()" is not async signal safe and just don't use it inside a handler, and shut down signal handling right before calling "fork()").

One last interesting note: POSIX never officially kicked "fork()" out of "async signal safe" list:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04

Summary: To be "async signal safe", code has to be reentrant, which means code cannot change global state, or lock any mutex, or calling "malloc/free", or any other code that does above. "fork()" GNU implementation mostly satisfy the above, except it allows 3rd party library to install "pthread_atfork()" handler. Those handler can easily violate the reentrancy rules hence rendering "fork()" not safe to use with signal.

refer below on how to write reentrant code:

https://sites.google.com/site/embeddedmonologue/home/c-programming/reentrancy-thread-safe-asynchronous-signal-safe-interrupt-safe-and-cancellation-safe