http://software.intel.com/en-us/articles/e-book-on-multicore-programming/
http://members.verizon.net/~babkin/tpopp/
http://en.wikipedia.org/wiki/Chip-level_multiprocessing
http://www.info.ucl.ac.be/~pvr/book.html book Concepts, Techniques, and Models of Computer Programming
http://en.wikipedia.org/wiki/MapReduce MapReduce
http://labs.google.com/papers/sawzall.html SawZall
http://en.wikipedia.org/wiki/Concurrent_programming
http://en.wikipedia.org/wiki/Lamport's_bakery_algorithm
http://www.ecst.csuchico.edu/~beej/guide/ipc/ IPC on Unix
http://www.intel.com/cd/software/products/asmo-na/eng/294797.htm
http://www.khronos.org/opencl/ http://www.spiral.net/
http://www.ibm.com/developerworks/linux/library/l-sprace.html
Shared-state Concurrency: This approach assumes multiple threads of execution with shared data using locks, monitors and transactions. It is the typical Java approach to concurrency and it is also the most widespread form of concurrency. However, shared-state concurrency is also the most complicated model to program to as it requires a fine level of synchronization and scheduling between threads.
Message-passing Concurrency: Uses asynchronous messaging to communicate between multiple threads over a specified port. This is the model used (very successfully) by Erlang. This model provides for a coarser level of interaction between threads but is easier to program to.
Declarative/Dataflow Concurrency: This approach is data-driven and extends sequential declarative-type programming with multiple threads of execution. It is a simpler approach then approaches #1 & #2 and it is very useful for certain types of problems; however, it is not widely used.
Fork is nothing but a new process that looks exactly like the old or the parent process but still it is a different process with different process ID and having it’s own memory. Parent process creates a separate address space for child. Both parent and child process possess the same code segment, but execute independently from each other.
The simplest example of forking is when you run a command on shell in unix/linux. Each time a user issues a command, the shell forks a child process and the task is done.
When a fork system call is issued, a copy of all the pages corresponding to the parent process is created, loaded into a separate memory location by the OS for the child process, but in certain cases, this is not needed. Like in ‘exec’ system calls, there is not need to copy the parent process pages, as execv replaces the address space of the parent process itself.
Few things to note about forking are:
The child process will be having it’s own unique process ID.
The child process shall have it’s own copy of parent’s file descriptor.
File locks set by parent process shall not be inherited by child process.
Any semaphores that are open in the parent process shall also be open in the child process.
Child process shall have it’s own copy of message queue descriptors of the parents.
Child will have it’s own address space and memory.
Fork is universally accepted than thread because of the following reasons:
Development is much easier on fork based implementations.
Fork based code a more maintainable.
Forking is much safer and more secure because each forked process runs in its own virtual address space. If one process crashes or has a buffer overrun, it does not affect any other process at all.
Threads code is much harder to debug than fork.
Fork are more portable than threads.
Forking is faster than threading on single cpu as there are no locking over-heads or context switching.
Some of the applications in which forking is used are: telnetd(freebsd), vsftpd, proftpd, Apache13, Apache2, thttpd, PostgreSQL.
Pitfalls in Fork:
In fork, every new process should have it’s own memory/address space, hence a longer startup and stopping time.
If you fork, you have two independent processes which need to talk to each other in some way. This inter-process communication is really costly.
When the parent exits before the forked child, you will get a ghost process. That is all much easier with a thread. You can end, suspend and resume threads from the parent easily. And if your parent exits suddenly the thread will be ended automatically.
In-sufficient storage space could lead the fork system to fail.
Threads are Light Weight Processes (LWPs). Traditionally, a thread is just a CPU (and some other minimal state) state with the process containing the remains (data, stack, I/O, signals). Threads require less overhead than “forking” or spawning a new process because the system does not initialize a new system virtual memory space and environment for the process. While most effective on a multiprocessor system where the process flow can be scheduled to run on another processor thus gaining speed through parallel or distributed processing, gains are also found on uniprocessor systems which exploit latency in I/O and other system functions which may halt process execution.
Threads in the same process share:
Process instructions
Most data
open files (descriptors)
signals and signal handlers
current working directory
User and group id
Each thread has a unique:
Thread ID
set of registers, stack pointer
stack for local variables, return addresses
signal mask
priority
Return value: errno
Few things to note about threading are:
Thread are most effective on multi-processor or multi-core systems.
For thread – only one process/thread table and one scheduler is needed.
All threads within a process share the same address space.
A thread does not maintain a list of created threads, nor does it know the thread that created it.
Threads reduce overhead by sharing fundamental parts.
Threads are more effective in memory management because they uses the same memory block of the parent instead of creating new.
Pitfalls in threads:
Race conditions: The big loss with threads is that there is no natural protection from having multiple threads working on the same data at the same time without knowing that others are messing with it. This is called race condition. While the code may appear on the screen in the order you wish the code to execute, threads are scheduled by the operating system and are executed at random. It cannot be assumed that threads are executed in the order they are created. They may also execute at different speeds. When threads are executing (racing to complete) they may give unexpected results (race condition). Mutexes and joins must be utilized to achieve a predictable execution order and outcome.
Thread safe code: The threaded routines must call functions which are “thread safe”. This means that there are no static or global variables which other threads may clobber or read assuming single threaded operation. If static or global variables are used then mutexes must be applied or the functions must be re-written to avoid the use of these variables. In C, local variables are dynamically allocated on the stack. Therefore, any function that does not use static data or other shared resources is thread-safe. Thread-unsafe functions may be used by only one thread at a time in a program and the uniqueness of the thread must be ensured. Many non-reentrant functions return a pointer to static data. This can be avoided by returning dynamically allocated data or using caller-provided storage. An example of a non-thread safe function is strtok which is also not re-entrant. The “thread safe” version is the re-entrant version strtok_r.
Advantages in threads:
Threads share the same memory space hence sharing data between them is really faster means inter-process communication (IPC) is real fast.
If properly designed and implemented threads give you more speed because there aint any process level context switching in a multi threaded application.
Threads are really fast to start and terminate.
Some of the applications in which threading is used are: MySQL, Firebird, Apache2, MySQL 323
1. Which should i use in my application ?
Ans: That depends on a lot of factors. Forking is more heavy-weight than threading, and have a higher startup and shutdown cost. Interprocess communication (IPC) is also harder and slower than interthread communication. Actually threads really win the race when it comes to inter communication. Conversely, whereas if a thread crashes, it takes down all of the other threads in the process, and if a thread has a buffer overrun, it opens up a security hole in all of the threads.
which would share the same address space with the parent process and they only needed a reduced context switch, which would make the context switch more efficient.
2. Which one is better, threading or forking ?
Ans: That is something which totally depends on what you are looking for. Still to answer, In a contemporary Linux (2.6.x) there is not much difference in performance between a context switch of a process/forking compared to a thread (only the MMU stuff is additional for the thread). There is the issue with the shared address space, which means that a faulty pointer in a thread can corrupt memory of the parent process or another thread within the same address space.
3. What kinds of things should be threaded or multitasked?
Ans: If you are a programmer and would like to take advantage of multithreading, the natural question is what parts of the program should/ should not be threaded. Here are a few rules of thumb (if you say “yes” to these, have fun!):
Are there groups of lengthy operations that don’t necessarily depend on other processing (like painting a window, printing a document, responding to a mouse-click, calculating a spreadsheet column, signal handling, etc.)?
Will there be few locks on data (the amount of shared data is identifiable and “small”)?
Are you prepared to worry about locking (mutually excluding data regions from other threads), deadlocks (a condition where two COEs have locked data that other is trying to get) and race conditions (a nasty, intractable problem where data is not locked properly and gets corrupted through threaded reads & writes)?
Could the task be broken into various “responsibilities”? E.g. Could one thread handle the signals, another handle GUI stuff, etc.?
Whether you have to use threading or forking, totally depends on the requirement of your application.
Threads more powerful than events, but power is not something which is always needed.
Threads are much harder to program than forking, so only for experts.
Use threads mostly for performance-critical applications.
Multi-threading refers to an application with multiple threads running within a process, while multi-processing refers to an application organized across multiple OS-level processes.
A thread is a stream of instructions within a process. Each thread has its own instruction pointer, set of registers and stack memory. The virtual address space is process specific, or common to all threads within a process. So, data on the heap can be readily accessed by all threads, for good or ill.
Multi-threading is a more "light weight" form of concurrency: there is less context per thread than per process. As a result thread lifetime, context switching and synchronisation costs are lower. The shared address space (noted above) means data sharing requires no extra work.
Windows has several ways to control access to a shared resource, including semaphores, mutexes, event objects, waitable timers, and critical sections. This level of flexibility cannot be easily designed into a language because the capabilities of operating systems differ.
The flags used to control access to a shared resource and provide synchronization between threads (and processes) are called semaphores.
When using a semaphore, a resource can be completely synchronized, in which case one and only one thread or process can access it at any one time, or the semaphore can allow no more than a small number of processes or threads access at any one time. Semaphores are implemented using a counter that is decremented when a task is granted the semaphore and incremented when the task releases it.
The second synchronization object is the mutex semaphore, or just mutex, for short. A mutex synchronizes a resource such that one and only one thread or process can access it at any one time. In essence, a mutex is a special case version of a standard semaphore.
The third synchronization object is the event object. It can be used to block access to a resource until some other thread or process signals that it can be used. (That is, an event object signals that a specified event has occurred.)
The fourth synchronization object is the waitable timer. A waitable timer blocks a thread’s execution until a specific time. You can also create timer queues, which are lists of timers.
You can prevent a section of code from being used by more than one thread at a time by making it into a critical section using a critical section object. Once a critical section is entered by one thread, no other thread may use it until the first thread has left the critical section.
template <class T> class Lock {
T& obj_;
public:
Lock(T& obj) : obj_(obj) {
obj.AcquireMutex();
}
~Lock() {
obj_.ReleaseMutex();
}
};
So now you use Lock<BankAccount> to lock a BankAccount
Self-deadlock occurs when a single thread attempts to lock a mutex twice: the second attempt will block indefinitely. This can easily happen when the same resource is used at multiple levels within an algorithm.
In particular, consider a class that attempts to provide a threadsafe interface by synchronising all member function calls with a single internal mutex. The mutex is locked at the beginning of every method, and unlocked on method return. If that class now calls a member function from within a member function, there will be a self-deadlock.
To counter this problem there is the concept of recursive mutexes. A recursive mutex will allow multiple locks from within a single thread to succeed, though that thread must unlock the mutex as many times as it has locked it. The disadvantage of a recursive mutex is a slight performance decrease.
http://bisqwit.iki.fi/story/howto/openmp/
http://en.wikipedia.org/wiki/Software_transactional_memory
The Intel Threading Building Blocks are cross-platform (Windows, Linux, and Mac OS X), support 32-bit and 64-bit applications as well as work with Intel, Microsoft and GNU compilers.
This library is specifically designed to work in concert with other threading technologies, such as Win32*, POSIX*, and OpenMP* threads, providing a high degree of design and development flexibility. The templates implemented in Intel Threading Building Blocks rely on generic programming in order to provide high-speed and flexible algorithms with very few implementation constraints.
Intel Threading Building Blocks adds to the functionality of Intel® Thread Checker, Intel® Thread Profiler, and the Intel® Compilers, to enable the rapid implementation of high-performance threads in applications.
http://msdn.microsoft.com/msdnmag/issues/06/09/CLRInsideOut/
http://msdn.microsoft.com/msdnmag/issues/05/08/Concurrency
http://msdn.microsoft.com/msdnmag/issues/06/06/ConcurrentAffairs
In the Microsoft® .NET Framework locks are implemented by the System.Threading.Monitor class. The Monitor class is a bit unusual because it does not define instances. This is because lock functionality is effectively implemented by System.Object, and so any object can be a lock. Here is how to use a lock to fix the race associated with totalRequests:
System.Threading.Monitor.Enter(totalRequestsLock);
try {
totalRequests = totalRequests + 1;
} finally {
System.Threading.Monitor.Exit(totalRequestsLock);
}
This pattern is common enough that both C# and Visual Basic® .NET have a special statement construct to support it. The following C# code is equivalent to the try/finally statement just shown:
lock(totalRequestsLock) {
totalRequests = totalRequests + 1;
}
While locks provide a way of keeping threads out of each other's way, they really don't provide a mechanism for them to cooperate (synchronize).
In the .NET Framework, synchronization functionality is implemented in the ManualResetEvent and AutoResetEvent classes.
These classes provide Set, Reset, and WaitOne methods. The WaitOne method causes a thread to block as long as the event is in the
reset state. When another thread calls the Set method, AutoResetEvents will allow one thread that called WaitOne to unblock, while ManualResetEvents
will unblock all waiting threads.
Creating a New Process Under UNIX
#include <stdio.h>
#include <sys/unistd.h>
void a( void *dummy ){
for(;;)
printf("a");
}
void b( void *dummy ){
for(;;)
printf("b");
}
void main(){
if (fork() == 0)
a();
b();
}
Unix pipes:
http://www.cs.cf.ac.uk/Dave/C/node23.html#SECTION002330000000000000000
Note that under UNIX it is possible to start a new process with a
function from the original address space.A new address space is created in which this
function is executed.
Example
There is a routine which has a variable int x=0
This routine is launching 2 threads without any syncronization code; each routine calls the same code:
int foo(){
x=x+1;
x=x+1;
x=x+1;
return x;
}
What are the possible values of x after both threads are completed?
Answer : 3,4,5,6
What Is a Process?
Most often the word process is used to indicate a program, or at least program characteristics.
When a processor switches from one process to the next, it needs a lot of information to be able
to continue with the next process at exactly the same place/context where it left off last time.
This is the only way process execution can continue as if it was never interrupted by a task switch.
A process is therefore defined by the information that is needed during a task switch. Think of:
The values of the CPU registers
These contain the context created by executed processor instructions.
A stack
This contains the context created by function calls, variable definitions, and so on Address space
This memory is sometimes called the apartment or working set . It is that part of system memory (the range of memory addresses) that is available to the process.
Meta data
Think of security privileges, current working directory, process name, process priority, and so on.
Thread
Most often the word thread is used to indicate a path of execution within a process.
As a thread lives within a certain process, it needs less defining information:
The values of the CPU registers
A stack
Thread priority (Often specified in relation to the priority of the process in which the thread lives.)
All threads of a process inherit the remaining defining information from the process:
Address space
Meta data
There are three conclusions to be drawn from the preceding two bulleted lists:
- Task switching between two threads in the same process does not introduce much overhead.
- All threads in the same process share the same address space and can therefore use the same global data (variables, file handlers, and so on)
They can of course also mess up each other's memory.
- Task switching between threads of different processes introduces the same overhead as any other task switch between processes.
Note that each active process has at least one thread of execution. This thread is called the main thread.
Cooperative Multitasking
An OS that uses cooperative multitasking assigns an equal amount of time to each task.
The importance, or priority, of the task is not taken into consideration. Tasks that do not need a full time slice can decide to be cooperative and give up
the remainder of their slice so another task can be scheduled earlier.
A problem here can be that tasks with a very low priority take up a disproportionate amount of time.
Preemptive Multitasking
An OS that uses preemptive multitasking assigns time slices according to task importance or priority.
This means that a task with a high priority will be scheduled more often (and perhaps even with a larger time slice) than a task with a low priority.
A danger here is that setting the priority of a task too high can mean that other tasks have little or no opportunity to do any processing.
This includes a task that you might try to start in order to lower the priority of the task that is causing the problems.
Tasks can also be preempted when they block themselves for synchronization or enter an idle state, such as is done with the Sleep function.
Fibers
It is, of course, always possible to implement your own paths of execution through a process and take care of task switching within your own code.
The tasks created this way are often called fibers—because they are a subset of a thread.
Exactly which information is accessible within a fiber and which information is needed upon a task switch is dependent on how you implement a fiber;
however, it will be difficult to shield the rest of a process from damages the fiber might inadvertedly do.
The fiber will almost always inherit the address space of the process (and therefore also that of any other threads within the process).
Some OSes offer supporting calls to help you create your own fibers and switch between them.
Deadlock
A deadlock occurs when the following happens:
ThreadA claims resourceA
A task switch occurs which makes ThreadB active
ThreadB claims resourceB
No matter what happens after this, ThreadA and ThreadB will ultimately become deadlocked.
This is because ThreadB will try to claim resourceA before it releases resourceB and ThreadA will not even consider releasing resourceA before it is able to
claim resourceB. ThreadA will block on the call EnterCriticalSection(&resourceB) and ThreadB will block on the call EnterCriticalSection(&resourceA)
http://manpages.courier-mta.org/htmlman7/pipe.7.html
http://www.cs.uml.edu/~fredm/courses/91.308/files/pipes.html
http://www.theillien.com/Sys_Admin_v12/html/v02/i04/a6.htm
-----------------------------------------------------
section 15.2 Pipes
http://book.chinaunix.net/special/ebook/addisonWesley/APUE2/
-------------------------------------------------
File: spin-condvar.c condvar.c job_queue3.c
http://www.advancedlinuxprogramming.com/downloads.html
http://www.advancedlinuxprogramming.com/listings
---------------PRODUCER - CONSUMER
http://felix.sourceforge.net/doc/rtl/flx_pthread/en_flx_pthread_0012.html
http://stackoverflow.com/questions/206857/how-to-implement-blocking-read-using-posix-threads
http://www.google.com/search?q=producer-consumer+queue++pthreads+examples
http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/rzahw/rzahwe17rx.htm
http://www.cs.umass.edu/~emery/classes/cmpsci377/current/notes/lecture_8_synch.pdf
http://cswilliams.ncat.edu/comp755/pthreadPC.mht
http://www.bayimage.com/code/pcpaper.html
http://msmvps.com/blogs/vandooren/archive/2008/12/02/470800.aspx
-----------------------------------------------
http://www.cs.binghamton.edu/~nael/cs350/notes/lec5.pdf
http://www.cilk.com/multicore-blog/?Tag=pthread
http://www.codeproject.com/KB/interviews/Interview-Dmitriy-Vyukov.aspx
---------------
PTHREADS
---------------------------------------
http://www.lambdacs.com/cpt/FAQ.html
https://computing.llnl.gov/tutorials/pthreads/
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
http://sourceware.org/pthreads-win32/index.html
http://programming-in-linux.blogspot.com/2008/03/multithreading-example-in-cc-using.html
http://www.opengroup.org/onlinepubs/000095399/functions/waitpid.html
http://108leaves.com/dailyc++/?cat=4
http://publib.boulder.ibm.com/iseries/v5r2/ic2924/index.htm?info/rzahw/rzahwrzahwcexcodex.htm
What you need is a shared, counted queue of structures.
In C++, one might define the queue as:
Shared<Counted<Queue<YourStruct> > > queue;
Shared<> is wrapper template that gives any struct a mutex with
acquire()/release() members.
Counted<> is wrapper template that gives any struct a semaphore with
raise()/lower() members.
// Producer does this:
queue.acquire(); // acquire exclusivity on queue
// Now you can put one of YourStruct into the queue
queue.release(); // release exclusivity on queue
queue.raise(); // declare that there is now at least one new YourStruct
waiting in queue
// Consumer does:
queue.lower(); // block until at least one new YourStruct waiting in
queue
queue.acquire(); // acquire exclusivity on the queue
// Now you can get one of YourStruct out of the queue
queue.release(); // release exclusivity on queue