Iacono's Working Set Structure

What is a working set?

The working set, w(x), of an element, x, is the set of all elements that have been accessed after x was most recently accessed.

Consider a linked list implementation of a double-ended queue that maps to every element in a BST. The element at the front of this queue is the most recently accessed, while the element at the back is least recently accessed. The element's position in this queue represents the size of its working set.

Whenever an element is accessed in the BST, it has a pointer to the same node in the queue, so this node can then move to the front. If we take this node and move it to the head of the queue, and update the nodes before and after it to point to each other, the size of every node's working set will be properly maintained.

Consider an arbitrary working set queue for accesses on the given tree:

The working set of node X is currently [5, 7, 4].

The working set of node Y is [5, 7, 4, 3].

If we now access 3, the new queue looks like:

The working set of node X is now the empty set; nothing has been accessed after it.

Notice, the working set of node Y hasn't changed since a distinct element was not accessed; node X was already a part of its working set.

The Working Set Structure - Preliminary Details

Iacono's Working Set Structure is a comparison based dictionary that utilizes this concept of a working set queue to ultimately achieve the bounds given by the Working Set Theorem of Splay Trees. It maintains a dynamic set S of n keys using a series of queues with balanced BST links to achieve such bounds.

Before we consider the actual structure, let's consider how a single deque that maintains this working set invariant would be maintained and how it would perform.

All inserts would be at the head of the queue. To search for an element x in the queue would take O( w(x) ) time, where w(x) is the working set of x, its position in the queue,

Linking a BST to this structure would just end up being a normal BST with overheads; the working set queue would be of no help.

A single queue containing every element in the set is not very helpful, so Iacono's idea was to partition such a queue.

For an arbitrary partition, the working set invariant can be easily maintained. Let's say we want to split the queue into many queues of size 3 other than perhaps the last queue. We just need to keep track of the heads of each queue, and keep them in for example an array that maintains the order they are in.

If we want to insert a node into this partitioned working set queue, we just insert in the first queue, then shift the tail node of every queue into the head of the next queue until the last.

Create a new tail queue if necessary.

Let's look at how an insertion of the element 1 would propagate through this queue:

1) The element would be inserted in the first queue


2) The tail of the first queue will become the head of the next queue

3 ) Finally, we reach the last queue, and since it was not already full, no extra queues need to be created

In an access, the process would be much the same: the element would be taken out of queue k, and it will be inserted into the first queue. Then, the tail shifts would remain the same except it would stop at queue k instead of the last queue.

In a deletion, instead of shifting tail nodes from earlier queues to head nodes of later queues, it would be the opposite. If queue k is the queue from which an element is removed, the head of queue k+1 becomes the tail of queue k, the tail of queue k+1 becomes the head of queue k+1, which cascades until the last queue.

This partition itself won't help the runtime of the structure, but the point is that it is equivalent to the singular queue in maintaining the working set invariant: if we traverse the head of the first queue through the tail of the last queue, all nodes before an element are still a part of its working set.

Now, hopefully, we can jump into how Iacono's Working Set Structure is actually organized.


This propagation of head enqueues and tail dequeues throughout the queues is called a shift, and is the fundamental operation of Iacono's Working Set Structure.

Although it was not depicted in the above figures, we should also store the tail of a queue along with the head to update pointers in constant time rather than have to traverse from head to tail.

The Working Set Structure - Organization

Iacono's Working Set Structure consists of a series of increasing size queues and balanced binary search trees. The contents of queue i and tree i are identical, and pointers map the nodes between each other. However, the elements between queues are distinct. This means to search the element of the last queue, we must search the elements of all previous queues.

This is where the trees come into play: by intelligently partitioning the queues, we can search the structure starting at the smaller queues containing elements with lower working set sizes, but still guarantee an O(logn) worst case if the element ends up in a later queue.

The set of elements is partitioned into queues of increasing size.

Queue i is of size 2^(2^i); the trees therefore have height 2^i.

There are log(logn) different queues.

The Working Set Structure - Operations

The Shift Operation

Recall that when we inserted an element into the first queue, the queue increased in size by 1. Then, to keep the size the same, we removed the tail node, inserted it as the head of the next queue, and repeated this process until the queues had proper sizes.

This operation is called a shift, and ensures that the working set of an element is maintained between operations on the structure.

A shift in Iacono's Working Set Structure works in much the same way as the independent working set queue. However, when we insert an element into a queue, and remove the tail node, we must perform the same operations on the tree that the queue is associated with.

Since a shift will be performed with every major operation, analagous to splaying with Splay Trees, we must analyze its runtime to ensure it achieves the working set bound.

For each queue-tree level, we will perform some constant operations related to pointers at the queue level. The limiting factor becomes the runtime for performing search, delete, and insert operations on the tree.

Since it is a balanced binary search tree, the height and therefore worst case per operation is bounded by the logarithm of its size: log(2^(2^i)) = 2^i at the i-th queue.

Shifting from the a-th queue to the b-th queue will take total runtime:

For a<b (shifting from lower queue to higher queue for inserts/searches)

For a>b (shifting from higher queue to lower queue for deletions)

It is important to note that this runtime analysis uses the geometric series property where the total runtime is dominated by the highest term in the series.

In this case, this would be the height of the tree at the greater queue.

This illuminates why we could not simply use a size of 2^i per tree/queue; they would not grow large enough to make use of this property, and would add an extra log factor to the runtime.

For the worst possible shift from queue 1 to queue k, where k is the highest queue, a=1 and b=log(logn). We see that the worst case runtime is then an acceptable O(logn).

Insertion

For an insertion, just insert the element into tree 1 and queue 1, and perform a shift from 1 to k, the last queue and tree. If queue k is full, create another queue and tree.

This will take runtime:

Deletion

For a deletion, just remove the element from tree and queue where it is found, i, and perform a shift from k to i.

This will take the same runtime as insertion, since we are considering the kth queue of size log(logn), it will dominate the runtime: O(logn)

Access

To access an element x, search the trees starting t1, then t2, ... until either it is found in tree j or it is not in the last tree k, at which we know it is not in the structure.

Since we search from tree 1 to tree j, the runtime is:

We then delete the item x from tree j, insert it into tree 1, then perform a shift from tree 1 to tree j, which we have seen also takes runtime O(2^j).


The main point of this structure is to prove the working set theorem holds, so further analysis is required to ensure this:

Once we find x in tree j, consider its life cycle since the most recent access (insertion if it was never accessed):

  • It was inserted into tree 1. If it is still in tree 1, then access time is O(1); otherwise
  • It was dequeued from q1 and enqueued into q2
  • Deleted from t1 and inserted into t2
  • Which cascades to tree and queue j

Every element in the trees before j is a part of x's working set.

Therefore, the working set of x, w(x), is at least 2^2^(j-1). From this we prove the working set theorem holds:

Limitations

Unforunately, there are several desirables properties the Working Set Structure does not have:

  • Tree Structure
    • Although the structure is composed of trees, it is not a tree itself. Therefore, we can't perform all normal tree operations.
    • Fortunately, we can still generate an inorder representation in O(n) time by merging every tree
  • Sequential Access Lemma, Dynamic Finger Theorem, Dynamic Optimality
    • Splay Trees have these properties, but none are implied by the Working Set Structure because they are not implied by the Working Set Theorem
    • Sequential Access Lemma - if elements are accessed in sorted order repeatedly, it will cost O(1) amortized
    • Dynamic Finger Theorem - "an access should be fast if it is close, in terms of rank distance, to the previous access" [Badoiu et al] : achieved in O( log ( rank distance ) + 2 )
    • Splay Trees are conjectured to be dynamically optimal, meaning it will perform asymptotically as well as the optimal rotating binary search tree for any access sequence