Multicore and Concurrent Programming

The most commonly taught method of ensuring data-integrity during concurrent operations is known as pessimistic locking, or placing a lock around the entire read and update operation.

Lock

Read

Transform

Update

Fig 1: Pessimistic Locking

Although this simple model does guarantee that no other writers can modify the data when one is holding the lock, it can lead to abysmal (really bad) performance on low-contention systems, where the writes and conflicts are infrequent.

What we hope to achieve when using optimistic locking is to minimise the time that we are within a lock, and by doing so, increasing our throughput. Because when we hold a lock, we stop all other threads from executing. To do this we need a system that satisfies the following assumptions:

- There are very few conflicts. This is the most important factor and if it is not met, often renders optimistic locking a slower solution than pessimistic locking.
- Data transformations are pure and have no side effects. That is, if we are given a transformation functionT(x) = k, for the same input x we always get k regardless of any external changes.
- It is cheap to retry transactions with new versions of data. In some systems this may not hold true, especially if it is computationally intensive to calculate the next version of the data.
- We can determine conflicts between writes easily and relatively cheaply. There are a few ways of doing this- via a version number which indicates how many times it has been written to, or using immutable data structures so checking if something is different is just using object identity.

The process of applying all changes or no changes at all is known as a transaction. Transactions guarantee (or theyshould) atomicity, which basically means all or nothing- you either get all changes or no changes. This helps maintain integrity of our data as well.

Read/Copy

Transform

Lock

Check

Swap

Fig 2: Optimistic Locking

Note that the transform step mutates/updates the local copied version. The check step validates whether the version that was copied is outdated. This can be done as previously mentioned via comparing the version or the object identity. If the check fails, the transaction is aborted and restarted. Else, the changes are committed.

Tthe CUDA kernel function is identified by the type qualifier __global__. The built-in variables threadIdx and blockIdx provide the indices for the local thread and block. They also allow a unique index to be computed for each 2D array element. Threads within the same block have access to 16k of on-chip common shared memory (declared with the __shared__ type qualifier), which has much lower latency than device memory. The __syncthreads() command provides a barrier such that all threads within a block synchronize before execution continues.

MULTICORE PPROCESSORS

http://en.wikipedia.org/wiki/Multi-core_processor

http://www.cpu-world.com/cgi-bin/CPUID.pl?CPUID=12373 Xeon E7- 4870

http://www.siliconmechanics.com/files/WestmereEXInfo.pdf

http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops

Important: if we are working with versioned objects, remember to bump the version at the end of the compare-and-swap operation.

Although in simple systems we can just assume that there is only a binary option: abort or apply changes, in more complex systems we can have more options. Take the VCS tool, Git for example. If we commit to a local copy of the shared data (the repository) and someone else commits as well, we can either:

- Win - our changes are committed and written to the repo.
- Lose - we need to "replay" our changes atop the changed data structure; essentially, the latest version.
- Merge changes - both our changes can be merged if they do not modify the same files. Else we would need to resolve the conflicts.

Having a side-effect free transformation is also important. For example if we had stored all the transformations performed on the data and we wanted an arbitrary version, we can just apply the transformations to the initial version. A concrete example:

changes = [ lambda x: x.append(1), lambda x: x.append(2) ] data = [] for item in changes: item(data) assert data == [1, 2]

No matter how many times we apply the changes, as long as we start from the same initial data we will always get the same final result. This is especially useful if, say we wanted to get the older version of our code. We wouldn't want that to differ due to some external conditions.

Let's see how a trivial Optimistic Locking implementation would look like in Python. First we will need some machinery for the versioned objects:

from threading import RLock, Thread class Versioned: def __init__(self, data): self.data = data self.version = 0 self.lock = RLock() def get(self): return self.data, self.version def commit(self, new_data, copied_version): with self.lock: if copied_version != self.version: raise Exception self.data = new_data self.version += 1

If you look at the Versioned.get method you will notice that there is no locking. There needn't be a lock because we are guaranteed that when the data is modified by the commit method, a lock will be held.

Then we need the threads to update their copies of the data and automatically retry failed transactions:

def task(): while True: try: data, version = obj.get() obj.commit(data + 1, version) break except: pass obj = Versioned(0) threads = [Thread(target=task) for _ in range(10)] [thr.start() for thr in threads] [thr.join() for thr in threads] assert obj.get() == (10, 10)

It won't solve all your concurrency issues, but when you get a situation which fits the four assumptions, please consider using optimistic locking.

http://jug.ru/meetings/308 Akka

There are several kinds of locks:

1. Blocking locks sleep a thread while it waits for another thread to release the lock. This is the usual behavior.
2. Spinlocks use a busy loop to constantly check to see if a lock has been released. This is more efficient if waiting is rare, but wastes CPU time if waiting is common.
3. Reader/writer locks allow multiple "reader" threads to enter a region simultaneously, but exclude all other threads (including readers) when a "writer" thread acquires the lock. This can be useful as many data structures are safe to read from multiple threads simultaneously, but unsafe to write while other threads are either reading or writing.
4. Recursive locks allow a single thread to acquire the same lock multiple times. Non-recursive locks can deadlock, crash, or otherwise misbehave when re-entered from the same thread.

https://www.arangodb.com/2015/08/lockfree-protection-of-data-structures-that-are-frequently-read/

http://greenteapress.com/semaphores/ Liittle book on Semaphores

http://preshing.com/20150316/semaphores-are-surprisingly-versatile/

http://www.vldb.org/pvldb/vol8/p209-yu.pdf

http://goparallel.sourceforge.net/wp-content/uploads/2015/02/MPI-Application-Tune-Up-r5.pdf

http://habrahabr.ru/post/250383/ concurrent hashmap

http://habrahabr.ru/post/250523/

http://habrahabr.ru/post/232361/ Book about distributed systems

https://www.justsoftwaresolutions.co.uk/threading/locks-mutexes-semaphores.html

http://deathbytape.com/post/110008612055/cpp-threading

http://queue.acm.org/detail.cfm?id=2698990

http://habrahabr.ru/company/intel/blog/243385/

https://tech.yandex.ru/events/yagosti/cpp-user-group/talks/1797/

http://levin-matveev.livejournal.com/84750.html

https://view.officeapps.live.com/op/view.aspx lockless

http://concurrencyfreaks.blogspot.com/2014/10/linked-lists-with-wait-free-reads-in-c.html

http://moodycamel.com/blog/2014/a-fast-general-purpose-lock-free-queue-for-c++

POSIX threads

http://habrahabr.ru/post/247903/

http://deathbytape.com/post/110008612055/cpp-threading

https://news.ycombinator.com/item?id=9013768

http://prog-books.livejournal.com/265.html

https://news.ycombinator.com/item?id=7815443

http://existentialtype.wordpress.com/2014/04/09/parallelism-and-concurrency-revisited/

http://rocksdb.org/blog/677/avoid-expensive-locks-in-get/ TLS

http://austingwalters.com/multithreading-common-pitfalls/

https://gist.github.com/staltz/868e7e9bc2a7b8c1f754 Reactive Programming

https://news.ycombinator.com/item?id=7964873

http://aseigo.blogspot.com/2014/07/multi-process-architectures-suck.html

Python

http://blog.pirx.ru/media/files/2014/python-concurrency-talk/python-concurrency.html#1

https://speakerdeck.com/trent/parallelism-and-concurrency-with-python

http://www.reddit.com/r/Python/comments/276xlw/why_coroutines_in_an_oop_language/

http://www.yosefk.com/blog/parallelism-and-concurrency-need-different-tools.html

http://diesel.io/ Non-blocking IO in python

https://channel9.msdn.com/Events/Lang-NEXT/Lang-NEXT-2014/From-Parallel-to-Concurrent

Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask

http://sigops.org/sosp/sosp13/papers/p33-david.pdf

http://highscalability.com/blog/2013/10/31/paper-everything-you-always-wanted-to-know-about-synchroniza.html

http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

http://syprog.blogspot.com/2012/03/linux-threads-through-magnifier-local.html

http://syprog.blogspot.com/2012/03/linux-threads-through-magnifier-remote.html

http://habrahabr.ru/post/201826/ async

Actors

http://habrahabr.ru/company/tiktokcoach/blog/206300/

http://www.macs.hw.ac.uk/~rs46/posts/2014-02-03-objects-boxes-actors-agents.html

http://habrahabr.ru/post/143237/ multithreading 1

http://habrahabr.ru/post/209128/ multithreading 2

http://habrahabr.ru/post/200924/ Reader-Writer C++

https://github.com/Amanieu/asyncplusplus

http://habrahabr.ru/company/ifree/blog/196548/

http://habrahabr.ru/company/ifree/blog/195770/ lock-free

http://habrahabr.ru/company/ifree/blog/195948/

http://habrahabr.ru/post/174369/

http://natsys-lab.blogspot.ru/2013/05/lock-free-multi-producer-multi-consumer.html

http://blog.libtorrent.org/2012/12/principles-of-high-performance-programs/

http://software.intel.com/en-us/academic#pid-13124-91

http://www.ops.rsu.ru/en/about.shtml

http://habrahabr.ru/company/xakep/blog/210480/

volatile гарантирует только, что компилятор не будет кэшировать значение в регистре (оптимизирующие компиляторы очень любят это делать: чем больше регистров — тем больше кэшируют). То есть чтение volatile-переменной всегда означает чтение из памяти, запись volatile-перемнной — всегда запись в память

разделяемый объект называется lock-free объектом (неблокируемым, non-blocking объектом), если он гарантирует, что некоторый поток закончит выполнение операции над объектом за конечное число шагов вне зависимости от результата работы других потоков (даже если эти другие потоки завершились крахом).

Более строгое понятие wait-free объекта гласит: объект является wait-free, если каждый поток завершит операцию над объектом за конечное число шагов.

Условие lock-free гарантирует продвижение как минимум какого-то одного потока, тогда как более сильное условие wait-free гарантирует успешное выполнение для всех потоков.

http://michaelrbernste.in/2013/09/23/formalizing-concurrency-distribution-and-mobility.html

http://developerblog.redhat.com/2013/08/15/c-cpp11-parallelism/

http://www.youtube.com/watch?feature=player_embedded&v=EJOl9VDbslI

http://herbsutter.com/2013/02/11/atomic-weapons-the-c-memory-model-and-modern-hardware/

http://www.ibm.com/developerworks/aix/library/au-aix-multicore-multiprocessor/index.html

http://www.takipiblog.com/2013/08/15/5-things-you-didnt-know-about-synchronization-in-java-and-scala/

http://pavpanchekha.com/blog/concurrency.html

http://pavpanchekha.com/blog/time-clocks.html

http://www.teigfam.net/oyvind/home/technology/072-pike-sutter-concurrency-vs-concurrency/

http://www.youtube.com/watch?v=f6kdp27TYZs

http://si14.livejournal.com/42022.html

https://news.ycombinator.com/item?id=5711232

https://news.ycombinator.com/item?id=6038341

http://stereopsis.com/async/

http://queue.acm.org/detail.cfm?id=2513575

http://habrahabr.ru/post/184436/ C++

http://habrahabr.ru/post/185706/ C++

http://habrahabr.ru/post/186200/ C++

http://www.infoq.com/news/2013/07/core-async Closure and ClosureScript

http://queue.acm.org/detail.cfm?id=2492433 non-blocking algorithms

http://concurrencykit.org/

http://rfw.name/blog/2013/07/13/asynchrony.html ASYNC

Java

http://lmax-exchange.github.io/disruptor/

http://higherlogics.blogspot.ca/2013/09/clr-concurrency-preventing-torn-reads.html

http://msdn.microsoft.com/en-us/library/windows/desktop/ms684251(v=vs.85).aspx

http://www.albahari.com/threading/

http://www.codeproject.com/Articles/152765/Task-Parallel-Library-1-of-n C#

http://www.codeproject.com/Articles/362996/Multi-core-programming-using-Task-Parallel-Library

http://superuser.com/questions/605039/when-you-start-multiple-threads-on-a-multi-core-processor-are-they-guaranteed-t

http://stackoverflow.com/questions/6584397/how-to-start-a-thread-at-specific-coresolved

http://vimeo.com/68383317

http://www.chrisstucchio.com/blog/2013/why_not_python.html

https://medium.com/code-adventures/174f1fe66127

https://justin.harmonize.fm/Development/2008/09/09/threading-model-overview.html

http://kachayev.github.com/talks/kharkivpy%230/index.html#/

https://blog.heroku.com/archives/2013/2/24/concurrency_is_not_parallelism/

http://www.infoq.com/interviews/coutts-haskell

http://journal.stuffwithstuff.com/2013/02/24/iteration-inside-and-out-part-2/

http://erratasec.blogspot.co.uk/2013/02/multi-core-scaling-its-not-multi.html

http://ak.channel9.msdn.com/ch9/b54e/8ceeaa3d-551e-4184-a964-9fd4012cb54e/GoingNative2012ThreadsSharedVar_med_ch9.mp4

http://blog.jcoglan.com/2013/03/30/callbacks-are-imperative-promises-are-functional-nodes-biggest-missed-opportunity/

http://www.reddit.com/r/programming/comments/1bm16t/simon_peyton_jones_explains_how_to_deal_with/

http://www.yosefk.com/blog/checkedthreads-bug-free-shared-memory-parallelism.html

http://blog.paralleluniverse.co/post/44146699200/spaceships

http://concurrencykit.org/index.html

http://preshing.com/20130605/the-worlds-simplest-lock-free-hash-table

https://habrahabr.ru/post/317882/ lock free list

Blocking

http://habrahabr.ru/company/nordavind/blog/176541/

TLS thread local storage

http://en.wikipedia.org/wiki/Thread-local_storage

http://itw66.ru/blog/c_plus_plus/537.html

Green Threads

http://c9x.me/gthreads/intro.html

Wiki

http://en.wikipedia.org/wiki/Readers-writers_problem

http://en.wikipedia.org/wiki/Producers-consumers_problem

http://en.wikipedia.org/wiki/Peterson's_algorithm

http://en.wikipedia.org/wiki/Non-blocking_algorithm

http://en.wikipedia.org/wiki/Synchronization_(computer_science)

http://en.wikipedia.org/wiki/Compare-and-swap

http://en.wikipedia.org/wiki/Non-blocking_synchronization

http://en.wikipedia.org/wiki/Memory_barrier

http://en.wikipedia.org/wiki/Lock-free_and_wait-free_algorithms

http://en.wikipedia.org/wiki/Inter-process_communication

https://github.com/kaisellgren/Concurrency-concepts

http://habrahabr.ru/post/164487/ in Java

Lock-free algorithms

http://www.yosefk.com/blog/its-locking-if-its-blocking.html

http://habrahabr.ru/post/182722/

http://habrahabr.ru/company/ifree/blog/219201/ Queue

http://msdn.microsoft.com/en-us/library/windows/desktop/ee418650(v=vs.85).aspx

http://preshing.com/20120612/an-introduction-to-lock-free-programming

http://www.1024cores.net/home/lock-free-algorithms/introduction

http://woboq.com/blog/introduction-to-lockfree-programming.html

http://www.infoq.com/presentations/Lock-Free-Algorithms

Lock-free queue:

http://habrahabr.ru/post/174369/

http://moodycamel.com/blog/2013/a-fast-lock-free-queue-for-c++

http://www.drdobbs.com/article/print?articleId=210604448&siteSectionName=parallel

http://www.drdobbs.com/article/print?articleId=211601363&siteSectionName=parallel

http://www.drdobbs.com/article/print?articleId=212201163&dept_url=/cpp/

http://www.reddit.com/r/programming/comments/yhv2v/writing_lockfree_code_a_corrected_queue/

http://dev.hasenj.org/post/31396790746/coroutines-vs-explicitly-async-apis

http://jordanorelli.tumblr.com/post/31533769172/why-i-went-from-python-to-go-and-not-node-js

http://www.codeproject.com/Articles/153898/Yet-another-implementation-of-a-lock-free-circular

http://habrahabr.ru/company/ideco/blog/147086/#habracut

http://habrahabr.ru/post/147107/

http://habrahabr.ru/post/147099/

http://habrahabr.ru/post/150801/

http://habrahabr.ru/post/118850/ Lock-free

http://habrahabr.ru/post/118515/ Lock-free

http://habrahabr.ru/post/118390/

http://habrahabr.ru/post/130101/ Non-blocking queue

http://habrahabr.ru/post/118917/

http://habrahabr.ru/post/118466/

http://habrahabr.ru/post/108857/ Observer in C++

CONTINUATIONS

http://blog.rfw.name/2012/10/11/continuations.html

http://matt.might.net/articles/by-example-continuation-passing-style/

http://blogs.msdn.com/b/wesdyer/archive/2007/12/22/continuation-passing-style.aspx

https://speakerdeck.com/u/multicoreworld/p/james-reinders-intel-united-states

http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

http://mortoray.com/2011/12/16/how-does-a-mutex-work-what-does-it-cost/

http://www.albahari.com/threading/

http://scalibq.wordpress.com/2012/06/01/multi-core-and-multi-threading/

http://www.ibm.com/developerworks/aix/library/au-multithreaded_structures1/index.html

http://www.infoq.com/presentations/Thinking-Parallel-Programming

http://yow.eventer.com/events/1004/talks/1055

http://channel9.msdn.com/Blogs/Charles/SPLASH-2011-David-Ungar-Self-ManyCore-and-Embracing-Non-Determinism

http://blog.incubaid.com/2012/03/28/the-game-of-distributed-systems-programming-which-level-are-you/

http://www.1024cores.net/

http://linuxplumbers.ubicast.tv/videos/concurrency-kit-towards-accessible-non-blocking-technology-for-c/

http://www.alexonlinux.com/multithreaded-simple-data-type-access-and-atomic-variables

http://www.youtube.com/watch?v=yHi6tPwAy98 Java multiprocessor programming

http://www.reddit.com/r/programming/comments/zv8ib/coroutines_vs_explicitly_async_apis/

http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html BOOK

http://attractivechaos.wordpress.com/2011/10/06/multi-threaded-programming-efficiency-of-locking/

http://www.igvita.com/2010/08/18/multi-core-threads-message-passing/

http://video.ch9.ms/ch9/0579/28a77412-2541-4857-97f1-9f8e012b0579/SPLASH2011DavidUngar_ch9.wmv

http://glyph.twistedmatrix.com/2012/01/concurrency-spectrum-from-callbacks-to.html

http://www.reddit.com/r/programming/comments/ooqic/ahead_of_his_time_why_threads_are_evil_and_events/

Java and Scala Concurrency

http://www.slideshare.net/thinkmeta/time-travel-predicting-the-future-and-surviving-a-parallel-universe-jdc2012

http://channel9.msdn.com/Shows/Going+Deep/Hewitt-Meijer-and-Szyperski-The-Actor-Model-everything-you-wanted-to-know-but-were-afraid-to-ask

http://nichol.as/zeromq-an-introduction

http://think-async.com/Asio/AsioAndBoostAsio

Pipes

The constant PIPE_SIZE establishes the number of bytes allocated for a pipe (the size of the pipe buffer; in fact, the size of pipe buffer is PAGE_SIZE). The constant PIPE_BUF (limits.h) defines the atomic operational limit (atomic writes to a pipe).

Parallel compression

http://www.iasylum.net/writings/parallel-compression.html

OpenMP

http://habrahabr.ru/blogs/cpp/134547/

http://software.intel.com/en-us/articles/intel-guide-for-developing-multithreaded-applications/

C++11

http://en.cppreference.com/w/cpp/atomic

http://www.baptiste-wicht.com/2012/07/c11-concurrency-tutorial-part-4-atomic-type/

http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Threads-and-Shared-Variables-in-C-11

http://blog.corensic.com/

http://www.corensic.com/Learn/Resources/ConcurrencyTutorialPartTwo.aspx C++ 11

http://solarianprogrammer.com/2011/12/16/cpp-11-thread-tutorial/

http://www.informit.com/articles/printerfriendly.aspx?p=1750198

http://return1.net/blog/2012/Jun/14/multithreading-with-c11-protecting-data

Coroutines

http://en.wikipedia.org/wiki/Coroutine

http://habrahabr.ru/post/143318/

http://www.embeddedrelated.com/showarticle/455.php

http://www.1024cores.net/home/lock-free-algorithms/so-what-is-a-memory-model-and-how-to-cook-it

http://blog.ashodnakashian.com/2011/07/almost-everything-you-read-about-threading-is-wrong/

http://www.codeproject.com/KB/threads/

http://mejedi.livejournal.com/15674.html

http://slowfrog.blogspot.com/2011/05/taking-advantage-of-multiple-cores-with.html

epool select

http://banu.com/2011/06/how-to-use-epoll-complete-example/

http://www.kegel.com/c10k.html

http://www.wangafu.net/~nickm/libevent-book/TOC.html

http://www.nightmare.com/medusa/async_sockets.html

Callbacks: sync and unsync:

http://blog.ometer.com/2011/07/24/callbacks-synchronous-and-asynchronous/

http://news.ycombinator.com/item?id=2907856

http://swtch.com/~rsc/talks/threads07/

http://www.usenix.org/events/hotos03/tech/vonbehren.html

All threads in a process share the same address space.

In other words, data in a static or global variable is normally always located at the same memory location, when referred to by threads from the same process.

Variables on the stack however are local to threads, because each thread has its own stack, residing in a different memory location.

Atomic operations

http://jfdube.wordpress.com/2011/11/30/understanding-atomic-operations/

http://en.cppreference.com/w/cpp/atomic

http://habrahabr.ru/post/157163/

CAS: compare and swap

CAS: Atomicity guaranteed for the following 3 steps together

Read the original value from a memory location.

Compute the new value to be set.

Set the new value only if the memory location is still the original value.

Deadlock occurs when two or more threads are blocked while waiting for each other. For example, the first thread is blocked on the second thread, waiting for a resource that the second thread holds. The second thread does not release this resource until it acquires a resource held by the first thread.

An operation that cannot be interrupted is said to be atomic.

Most machines have a Test-and-Set (TS) machine instruction, which returns the old value of a shared variable and assigns a constant value to it in one atomic action.

Built-in functions for atomic memory access in GCC

http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html

http://en.wikipedia.org/wiki/Compare-and-swap

Critical section: a series of statements that must be performed atomically, and only executed by one thread at a time

The mutex (mutual exclusion) guarantees that only one thread accesses the shared variable at a time.

http://stackoverflow.com/questions/62814/difference-between-binary-semaphore-and-mutex

http://stackoverflow.com/questions/62814/difference-between-binary-semaphore-and-mutex/86021#86021

The synchronized keyword in Java implements mutual exclusion.

When a synchronized method is called, the calling thread attempts to acquire an exclusive lock on the object. If any other thread holds that lock, then the calling thread stalls until the lock is released. However, this solution is not wise because it can lead to deadlock.

http://book.opensourceproject.org.cn/embedded/cmprealtime/index.html

http://www.cs.duke.edu/courses/cps110/spring99/slides/

http://pages.cs.wisc.edu/~swift/classes/cs537-sp09/lectures.html

http://cseweb.ucsd.edu/classes/fa05/cse120/lectures/

deadlock

livelock

starvation

spinlock - A mutex that uses busy waiting is called a spinlock.

Semaphore

1. When you create the semaphore, you can initialize its value to any integer,

but after that the only operations you are allowed to perform are increment

(increase by one) and decrement (decrease by one). You cannot read the

current value of the semaphore.

2. When a thread decrements the semaphore, if the result is negative, the

thread blocks itself and cannot continue until another thread increments

the semaphore.

3. When a thread increments the semaphore, if there are other threads waiting,

one of the waiting threads gets unblocked.

At any time, the value of the semaphore represents the number of additional

threads that may enter. If the value is zero, then the next thread will block

until one of the threads inside exits and signals. When all threads have exited

the value of the semaphore is restored to n.

Barrier synchronization comprises three actions:

- task posts its arrival at the barrier,

- the task waits for other participating tasks to reach the barrier, and

- task receives notification to proceed beyond the barrier.

typedef struct {

mutex_t br_lock; /* guarding mutex */

cond_t br_cond; /* condition variable */

int br_count; /* num of tasks at the barrier */

int br_n_threads; /* num of tasks participating in the barrier synchronization */

} barrier_t;

barrier(barrier_t *br){

mutex_lock(&br->br_lock);

br->br_count++;

if (br->br_count < br->br_n_threads)

cond_wait(&br->br_cond, &br->br_lock);

else{

br->br_count = 0;

cond_broadcast(&br->br_cond);

}

mutex_unlock(&br->br_lock);

}

Condition variables

condition variables, a synchronization mechanism that works in partnership with monitors or with mutexes used in the style of monitors. There are two basic operations on a condition variable: wait and notify. (Some systems use the name signal instead of notify.)

http://stackoverflow.com/questions/70773/pthread-cond-wait-versus-semaphore

When a task examines a condition variable, the task must have exclusive access to that condition variable. Without exclusive access, another task could alter the condition variable's conditions at the same time, which could cause the first task to get an erroneous indication of the variable's state. Therefore, a mutex is always used in conjunction with a condition variable. The mutex ensures that one task has exclusive access to the condition variable until that task is finished with it.

POSIX condition variables are initialized with pthread_cond_init independent

of any particular mutex; the mutex is instead passed as an argument

to pthread_cond_wait, along with the condition variable being

waited on. This is a somewhat error-prone arrangement, because all concurrent

waiters need to pass in the same mutex.

The operations corresponding to Java's Object methods notify() and notifyAll() are called pthread_cond_signal and pthread_cond_broadcast.

The API allows a thread to invoke pthread_cond_signal or pthread_cond_broadcast without holding a corresponding mutex, but using this exibility without introducing a race bug is diffi.cult.

Producer-consumer problem

http://msdn.microsoft.com/en-us/magazine/cc163599.aspx

Producer create items of some kind and add them to a data structure; consumers remove the items and process them.

Example: Event-driven programs. An “event” may be: the user presses a key or moves

the mouse, Whenever an event occurs, a producer thread creates an event object and

adds it to the event buffer. Concurrently, consumer threads take events out of the buffer and process them. In this case, the consumers are called “event handlers.”

Readers-writers problem

1. Any number of readers can be in the critical section simultaneously.

2. Writers must have exclusive access to the critical section

In other words, a writer cannot enter the critical section while any other

thread (reader or writer) is there, and while the writer is there, no other thread may enter.

A readers/writers lock is much like a mutex, except that when a thread

locks the lock, it speci.es whether it is planning to do any writing to the

protected data structure or only reading from it. Just as with a mutex,

the lock operation may not immediately complete; instead, it waits until

such time as the lock can be acquired. Any number of

readers can hold the lock at the same time, they

will not wait for each other. A reader will wait, however, if a writer holds

the lock. A writer will wait if the lock is held by any other thread, whether

by another writer or by one or more readers.

Readers/writers locks are particularly valuable in situations where some

of the read-only operations are time consuming, as when reading a fi.le stored

on disk. This is especially true if many readers are expected. The choice

between a mutex and a readers/writers lock is a performance trade-off..

Because the mutex is simpler, it has lower overhead. However, the readers/

writers lock may pay for its overhead by allowing more concurrency

Starvation

Thread starvation: the is the possibility that one thread might wait indefinitely while others proceed.

For most concurrent applications, starvation is unacceptable, so we enforce

the requirement of bounded waiting, which means that the time a thread

waits on a semaphore (or anywhere else, for that matter) has to be provably

finite.

If a writer arrives while there are readers in the critical section, it might wait in queue forever while readers come and go. As long as a new reader arrives before the last of the current readers departs, there will always be at least one reader in the room. This situation is not a deadlock, because some threads are making progress, but it is not exactly desirable. A program like this might work as long as the load on the system is low, because then there are plenty of opportunities for the writers. But as the load increases the behavior of the system would deteriorate quickly (at least from the point of view of writers).

Pthreads

http://www.linuxforu.com/2011/08/light-weight-processes-dissecting-linux-threads/

https://computing.llnl.gov/tutorials/pthreads/

http://www.ibm.com/developerworks/library/l-posix3/

http://www.ibm.com/developerworks/aix/library/au-multithreaded_structures1/index.html

http://www.ibm.com/developerworks/linux/library/l-memory-leaks/index.html

There are around 100 Pthreads procedures, all prefixed "pthread_" and they can be categorized into four groups:

1) Thread management - creating, joining threads etc

2) Mutexes

3) Condition variables

4) Synchronize between threads using read/write locks and barriers

The POSIX standard, for example, includes readers/writers locks with procedures such