Even for die-hard PHP coders, pcntl_fork() can be quite daunting. The trouble is, most PHP users have learnt to code linear, and mostly only for outputting dynamic content, like webpages.
The pcntl-functions (and to some extent the posix-functions) are different: they provide methods to program interactive applications unsuitable to run in a webenviroment like Apache. It's quite possible to program your own webserver in PHP using these methods.
When you did some serious programming in languages like C, pcntl_fork() shouldn't have any secrets for you. It behaves almost identically like fork() does. However, for the novice application programmers amongst us (like I am), pcntl_fork() will appear highly illogical.
Simple: PHP does not support any kind of threading, or asynchronous processing. Therefore, you have to fork if you want your application to do two or more things at the same time.
Say you are programming a networked server, which listens to a certain port, accepts connections and keeps them open. That can be done synchronously with while-loops and arrays containing socket resources but that poses one problem: when an action that is taken on one connection takes a while to finish, all other actions are stalled. The program is busy with that one connection. Threading would solve this: you simply launch a thread for each connection (well, kinda).
With pcntl_fork(), this can be done too. However, it's important to know what it actually does.
First: let's take a look at what a process usually is. All processes have an unique ID, which is called the PID. This is the reference to that process wich can be used to send signals and the like. All processes also have a PPID, which is a reference to the parent process: the process which created this process. This number is usually 0, which means that there is no parent.
What fork() does (and pcntl_fork() does too, since it is basically the same) is copying the current process. That's right: copying. It does not create an empty process, nor does it 'call' another program, it just creates an identical copy of the current process. Well, almost identical.
There are three differences to be noted:
The first difference is logical: it's a new process, so it receives a new PID. The second is logical too: the parent process of the new process is the old process, so the PPID of the copy is the PID of the original. From now on, we'll call the copy 'the child' and the original 'the parent'.
There is a third difference, which can look illogical at first glance: the return-value of pcntl_fork().
pcntl_fork() returns an INT. To the child, it return 0. To the parent, it returns either -1 or the PID of the new child. Now, this is interesting.
When the return-value is -1, something went wrong. The child is not created since the fork() system call didn't run or returned an error. The rest of the code is processed as if the fork() never happened.
If pcntl_fork() return 0, you know that this is the child. This is the best thing to use to separate the parent from the child.
On all other values, you know that this is the parent and the child was succesfully created with a new PID. The return value from pcntl_fork() is that PID. This creates a lot of possibilities for the parent: we know that we are the parent, and we know the name of the child.
Lets look at some code:
What we have here is a program that copies itself once. Both copies have the same defined variables and execute the same functions (both echos). The if-elseif-else part is the main point where you can separate the two processes.
When you want the child to do something alltogether different then the parent, you should define that using the returned $pid. Remember: everything that is defined before pcntl_fork() (variables, functions, classes) exist in both the parent and the child. This can be undesirable, so it should be handled carefully.
You can exit each process without consequences for the other. Well, almost.
When a child process dies, it's death should be handled by the parent. If it isn't, the child becomes a zombie: it doesn't consume resources, but it still is a process with a PID and all that. This is undesirable, since most (all?) operating systems have an upper limit on the processes it can handle.
When a child dies, a signal is sent to the parent (SIGCHLD). The parent can then handle the death of the child for internal processing. The correct way to unzombie a child is using pcntl_waitpid(). You can use that function to wait until the child dies, or to detect that a child has already died. Use pcntl_wait() when you want to do this for a myriad of children. Look at the relevant section of the PHP manual for more options (including letting the function know not to suspend normal operation).
Using SIGCHLD, however, is not always foolproof. When you quickly create many shortlived children, handling SIGCHLD in combination with pcntl_waitpid() might not handle all zombie processes. I find this way to work best:
When we examine the above code, we'll see that the children that are created also have the $children array. That might be desirable, since the children then know their brothers and sisters. Note that $children in the child does not contain it's own PID!
When you don't want to use $children in the child, it's a good idea to unset() it. The parent will still have it, but it won't consume extra memory in the child. This goes for all variables define before pcntl_fork() was called, offcourse.
In UNIX (and the like), a daemon is process that runs in the background. It has no (living) parent. It is really easy to daemonize your PHP-program using pcntl_fork():
We need to consider some other things though: a PHP-script usually times out after a while, causing it to exit. For a decent daemon, this is undesirable. Use the following at the start of your program to prevent it:
When you want the program (daemon or not) to handle systemcalls, use this bit of code:
We need the declare part to be sure that each signal is handled when it is received. Read up about ticks and pcntl_signal for more details.
We've now seen that pcntl_fork() creates a copy of the current process and what the differences are between the parent and the child. We've covered the basics about separting the two processes and handling the death of the child. We've also learnt how to daemonize our PHP script and how to live forever. And lastly, we've learnt how to handle systemcalls.
Now, let's write a basic server which daemonizes, accepts connections (from telnet or something) and handles them in a good fashion. Connect to it with telnet on port 2007, say something followed by a return and see what it does. It echoes the PID of the daemon which can be used to send a SIGTERM, which will cause the program to exit in a good fashion, or a SIGHUP which will cause the program to reinit.
sonic server daemon has a plugin type arch. if you dont want to build your own...
How can I check if a daemon is running?
By anonymous (not verified) at Wed, 20/08/2008 - 5:15pm | reply
That depends on where you want to check from. If you want to check from the parent, you already know the PID of the daemon (it is returned by pcntl_fork). You can then check by doing:
posix_kill will sent a given signal (in this case, 0) to a given PID. It will return true when successful, and false when not. Signal 0 doesn't do anything, so it is the preferred way to check.
Remember, after doing pcntl_fork there are two processes: the child and the parent. The child will see 0 as the return value of pcntl_fork and the parent will get the PID from the child as the return value.
If you want to check from another place than the parent if the daemon is running, you should create a pidfile. This is done by writing out the PID to a file with the daemons name, and is often done to prevent multiple instances of it. The child should write out the pidfile, and it can do so by using posix_getpid.
Now, there is a way to provide IPC (InterProcess Communication) in PHP, to allow communication between the forks (or between 2 PHP scripts).
SEM (Semaphore), which give a way to control the resources-access (for conflicts): ttp://en.wikipedia.org/wiki/Semaphore_%28programming%29
Examples of implementation:
Thanks for the input!
If one is going to do anything serious with multiple processes and intercommunicating processes, those links are indeed useful.
If only I could speak French... ;-)
Note, however, that if you are going to use a background PHP daemon and a front end web page PHP script, I would advise not to use those techniques, but rather communicate through files / databases.
This is a class (with english documentation) which implements pseudo-threads in PHP, using IPC : http://www.phpclasses.org/browse/package/1136.html
Look at the code, it's very easy to understand (with the PHP manual under the hand). :)
Personnaly I use declare( ticks = 1) + register_tick_function() to check the Message Queue, and not pcntl signals like the above class.
Hi! Very good this post of yours! Thanks!
I still have a doubt though: I'm designing a data-aware object and one of its methods (a queue) must run in background. Problem is: as soon as the parent-process dies, the child looses the database connection.
How do we avoid this? Do I have to connect to the database from the child method?
First of all: thanks! :-)
I'm not completely sure on this, but I believe it depends on the fact that a connection to a database lasts for as long as there is no call to close the connection or as long as the process lives.
When using mysql_connect() in a "regular web page" from PHP, you don't *have* to call mysql_close(), as long as you are sure the PHP process is not going to last forever. This might be the problem with your situation.
I need testing for this, which I haven't done yet, but I assume that the connection to the database is shared between the parent and its children if the call to open it was done before pcntl_fork(). If that is the case, it might be possible that the connection is implicitly closed as soon as the parent process dies. I wonder what happens when a child dies, though.
If all that is true, you need to reestablish the connection after pcntl_fork(). With all I know and assume now, the best practice would be closing all the existing connections after forking and reopening them for every instance. Alternatively, you could check the resource identifier before each request. Maybe using mysql_pconnect() helps too.
By the way: my first suggestion is always a good design method, just in case the database has a weird per-connection handling of locks or transactions. Besides, I'm hoping that the background process is not called via pcntl_fork() in a PHP script that is called via a web page. Things get nasty when you do it that way. Use things like cron or a daemon that reads out a database / file with request for that purpose.
... to say: to make things worse, although I have not much PHP practice, I'm using it's OOP... The afore-mentioned queue is a method of an object called (surprise!) "queue".
I used pcntl_fork to mantain all the variables in the background... Is it possible to mimick this with exec(), shell_exec() or any other function()?
I think I would go with cron-jobs calling a PHP script. The variables can be stored in a database or flat file (I'd use SQLite, which is magnificent). That also works with the various exec() calls, which might be more suitable for your environment. You can always call the script from a remote machine using some sort of scheduling mechanism.
You could do a foreach routine right before the end of execution to store all data and then read that into a new object when the process is called upon the next time.
It might indeed be tempting to keep the process running, just to maintain data. Storing and retrieving data is indeed overhead, but not much. Look at how much memory a typical PHP process consumes. Without fiddling too much with the settings an empty script can easily take 30 Megs of memory. That's a lot of waste, especially since PHP slows down dramatically when it needs to swap.
On the other hand, I do not know your exact needs. But given what I do know, I'd write a storage routine for the object.
Hi, I managed to rewrite my process so it's divided in two parts. A daemon-like server (receiving requests through MySQL) and webpage clients...
I'd want to keep the server running (in background) until a specific flag is appended to the table. It works fine! YEAH! Thanks for your help on this!
Now the "bad news" part: if the process keeps doing nothing (waiting for a new record to process) it is automatically killed after 2 minutes... I tried using
Can you help me again?
The first one:
set_time_limit(0) only does so much. The two ini_set()'s are needed too, provided that the server supports the call.
Have luck. Try the first solution, if it doesn't work: I can possibly help you with setting up a remote scheduler, but I will need more info.
Hi, I tried the first option but didn't succeed. Now, I'm using the second option (a loop), I spawn the background process and keep its PID. Then, every second I check that PID (through ps and grep) and if it's not running, I spawn it again. :)
There's a little side-effect: when I try to stop it (through a flag in the database) it takes time... I think I have to "stop" it many times, one for each "ressurection"... I'll study this a little more...
Hi, I gave you some time to answer before returning, now I have some more doubts... :)
I got rid of the database access problem (opened the connection only inside the spawned process)...
Since the day I saw your article, I've been finding concerns about pcntl_fork from a web process, just like you do (thanks once again), so it must be right. I've been lucky and haven't experienced any problem yet, probably because I only spawn ONE background process (and check to avoid starting any other instance).
Unfortunately, I'll have to find an alternative to pcntl_fork anyway... My webserver offers PHP5 but has disabled pcntl functions... :'(
Since I only need to monitor a table to read, process and flag any new records added to it, I can think of two alternatives:
What do you think? Any other ideas?
Please note the parse error line 67:
Thanks for this great post.
By Nicolas PESTANA (not verified) at Fri, 01/06/2007 - 3:35pm | reply
I recently revamped this site adding GeSHi capabilities (code highlighting) and edited all the pages accordingly. Apparently drupal doesn't accept < properly. It just skips until the next >.
I'm actually surprised that was the only parse error. Quite a coincidence. Note that a great deal of code was missing before the fix. Nasty business.
Anyway, I did a replace for < and > using < and >. That should do the trick. It might be the case that the other blocks of code are still not valid. I'll look at them later.