Application into Parallel Search Structures

Parallel programming approach to data structure:

  • Process independently access data structures via concurrency control (locks)
    • Non-blocking binary search tree - cost depends on number of concurrent processes and not necessarily balanced
    • CBTree - concurrent splay tree, performs very well empirically but does not provide same guarantees of normal splay tree
  • Processes use data structures that are designed to run operations in parallel
    • Parallel 2-3 tree: can run p operations on p processors in O( logn + logp ) time

BUT:

  • Difficult to use - programmer must know about the structure very well in order to coordinate access (or use concurrency control)

Easier to let the programmer write programs normally and use a parallel black box data structure

Implicit Batching:

  • Programmer writes a parallel program with calls to a parallel data structure
  • Runtime environment understands and connects the calls to the structure and the structure itself
  • Scheduler batches operations implicitly, and schedules accordingly

Kunal Agrawal, Seth Gilbert, Wei Quan Lim in 2018 designed such a black box parallel structure supporting this:

    • Based on Iacono's Working Set Structure
    • First parallel structure that provides worst-case bounds that adapt to the access sequence
    • Takes work nearly proportional to the working set bound
    • Uses a program DAG model
      • vertices are calls to the data structure
      • edges between nodes when it requires a previous call
    • BUT requires a weak priority or greedy scheduler, which are often not used in practice
  • p is the number of processes
  • eL is the number of small-ops, defined as operations in L that are performed on the map when its size is less than n
    • can usually ignore eL*logp term
  • d is the number of calls to the data structure
  • T1 is total number of calls (nodes in DAG), T_infinity is longest path in DAG

Parallelism is within O [ (logp)^2 +logn ] of the optimal

  • Uses batched search structure instead of a tree
  • If element is in last tree and many threads access at once, will take b*logn time when it should take logn+b
    • Combine duplicate accesses in batching