Application into Parallel Search Structures
Parallel programming approach to data structure:
- Process independently access data structures via concurrency control (locks)
- Non-blocking binary search tree - cost depends on number of concurrent processes and not necessarily balanced
- CBTree - concurrent splay tree, performs very well empirically but does not provide same guarantees of normal splay tree
- Processes use data structures that are designed to run operations in parallel
- Parallel 2-3 tree: can run p operations on p processors in O( logn + logp ) time
BUT:
- Difficult to use - programmer must know about the structure very well in order to coordinate access (or use concurrency control)
Easier to let the programmer write programs normally and use a parallel black box data structure
Implicit Batching:
- Programmer writes a parallel program with calls to a parallel data structure
- Runtime environment understands and connects the calls to the structure and the structure itself
- Scheduler batches operations implicitly, and schedules accordingly
Kunal Agrawal, Seth Gilbert, Wei Quan Lim in 2018 designed such a black box parallel structure supporting this:
- Based on Iacono's Working Set Structure
- First parallel structure that provides worst-case bounds that adapt to the access sequence
- Takes work nearly proportional to the working set bound
- Uses a program DAG model
- vertices are calls to the data structure
- edges between nodes when it requires a previous call
- BUT requires a weak priority or greedy scheduler, which are often not used in practice
- p is the number of processes
- eL is the number of small-ops, defined as operations in L that are performed on the map when its size is less than n
- can usually ignore eL*logp term
- d is the number of calls to the data structure
- T1 is total number of calls (nodes in DAG), T_infinity is longest path in DAG
Parallelism is within O [ (logp)^2 +logn ] of the optimal
- Uses batched search structure instead of a tree
- If element is in last tree and many threads access at once, will take b*logn time when it should take logn+b
- Combine duplicate accesses in batching