Q-Heap (Fredman & Willard, 1994)

Q-Heaps are similar to fusion trees, in that they, as a data structure, operate based on the different bits in the words they have to sort. Fusion trees are, however, beyond the scope of this project, which will focus mainly on Q-heaps. Q-heaps allow for the use of Boolean logic in algorithms that can lead to a significant speed-up.

The input into a q-heap is a sorted array of numbers, similar to the lists that are attached to each node in the catalog tree. Let b be the number of bits used to represent each number. Given a set S = {u1,u2,u3,u4…uk} where ui>ui-1 for 0<i<k+1, we construct a set B(S) = {c1,c2,c3,c4…ck-1} where ci = msb(ui,ui+1), the most significant bit that is different between each number. This set is not a multi-set and thus, may have fewer than k-1 elements. Let zs be the sequence c1,c2,c3,c4…ck-1 (has k-1 elements). We can also represent zs with ts = d1,d2,d3,d4…dk-1 where di = rankB(S)(ci), the rank (position from the maximum) of the element in the set B(S), mapping it from B(S).

Recursively, take the maximum of c, cj in zs and construct a binary search tree with the root labeled as cj, such that for i<j, ci will label nodes in the left subtree, and the rest, labeling nodes in the right subtree. Querying would work as such: if a number had a 0 at the differing bit, proceed down the left subtree, if not, proceed down the right subtree. We can then augment the tree with leaves numbering from 1 to k, left to right, representing these numbers. Computing the number would be central to querying for a value through the path it takes.

As an example, we will use a set of 8-bit unsigned numbers consisting of 5, 19, 30, 49, 52, 63, 68, 75, 88, 93, 127.

Querying

Lemma: For a query, u, Tree(zs ), i = Leaf(zs,u), the leaf traced by the path taken by u through the tree, and rankB(S)(msb(u,ui )) can determine ranks(u) after comparing u with ui(less than, equal or greater than)

Let r = |B(S)| and m = rankB(S)(msb(u,ui )). By induction on r-m, if r-m = 0 and u != ui , the most significant bit is outside of B(S) and if u < ui , ranks(u) = 0, else ranks(u) = |S|. If u = ui, there is nothing to prove.

If r-m > 0, the elements in B(S) determine the largest possible bit that can differ between any two elements. As m<r, u has to agree with ui in terms of what bit lies in the largest possible bit. The subtrees of the root at S, can then be recursed with the same argument from the start, to eventually obtain ranks(u), by adding up the number of leaves that are to the left of where u's place will be.

This lemma will be used as the basis to construct a look-up table such that traversal is unnecessary.

Calculating rankB(S)(msb(u,ui ))

In essence, msb(u,ui) is the leftmost bit of (u XOR ui). Let the result of (u XOR ui) be v. Assume that v != 0, as this can be verified in one machine instruction.

Here, we introduce the concept of sparsity, where we consider a number d-sparse if the position of its one-bits can take the form of {a+di| 0<= i< d}. If v is d-sparse, there exists 2 constants y1 and y2 such that for a y= (y1v) AND y2 , the ith bit of the significant part of y is the bit in position a+di| 0<= i< d .

Consider a partition of s = √(b) + 1 bits within v. Two computations are done, the first to find the block with the first one-bit, and the second, to find the bit position of the first one-bit in that block. Define a C1 such that in each block, the leftmost position is a 1, and a C2 that is the bit-wise complement of C1. The first function on v, is to compute (C1 - [(C1 - x AND C2) AND C1]) OR (x AND C1) which gives a value, p, in which the leading bit of each block is one, only if there is a one in the block. It is then possible to compress this value, as this value is d-sparse, where d is the size of the block.

Let P = {bin(0), bin(1) ... bin(b/s -1)}. Calculating the rank of obtained p above, in P allows us to know the position of the block containing the leading one. Concatenate together the powers of 2 in P to the length of b/s blocks (for e.g. 0100 0010 0001) if each block was 4 bits in length (can be pre-computed). Compute a value 10..p 10..p.... with each field being the size of the block. This can be obtained through bitwise multiplication and OR instruction. If we subtract the concatenation from the computed value repeating 10..p, and count the number of one-bits in the leading bit of each field by a suitable multiplication, we can obtain the rank, which tells us which block the first one-bit is in.

Repeat this procedure on the block with the first one-bit, performing rank-computation on the bit-block using the method in the previous paragraph. msb(u,ui) can thus be calculated in constant time.

rankB(S)(msb(u,ui )) can then be computed using the same rank computation, replacing the powers of 2 with a concatenation of the elements in B(S).

Getting Leaf(zs,u)

To compute Leaf(zs,u), we have to be able to map the nodes to relevant bit values of u.

First we construct a mask to remove the irrelevant bit values of u with reference to the tree. In the example above, the mask would be 01111100. Then, we construct a multiplier to align these bit numbers in contiguous order before extracting a field to be decoded that corresponds to a mapping of bit values from u to zs. Let bin(a1, a2, ... ak) = 2a1 + 2a2 ... + 2ak . There exists a multiplier M = bin(m1, m2, ... mk) that pairs each mi with each ai in B(S), relocating the important bits to a narrow field for extraction.

To figure out which leaf it lands in in constant time, we have to construct a decoder that extracts the path through the tree from the field that results. This decoder holds a tuple of sets, where the ith set corresponds to the bit positions of the extracted field that the ith element within B(S) affected. Let r = |B(S)| and M denote the union of these sets. The ith set can also be defined as {mj+ai-b | 1<=j<= r and b <= mj + ai < b +f}. In our example word size, b, is 8.

We will use the first 4 elements with a possible multiplier as an example.

For any given a' in B(S), let B(S)' denote the set of those ai s that precede a' (relative to insertion ordering) and let m' denote the mi value paired with a'. f in this case is 5L3 where L is the size of the Q-heap.

Conditions for the multiplier listed in the paper include the following:

  • The residues (mod f) of the various mj's including m' are distinct.
  • For t'=m'+a'-b we have that 2L<=t'<f
  • The interval [t'-2L, t'] avoids all members of M', M' given as {mj+ai -b | mj precedes m', ai precedes a', and b<=mj+ai <b+f}
  • For no i and h, with ai and ah in B(S)', does m'+ ai - b fall into the protected interval [th -2L, th].

The idea behind these conditions, is such that we can obtain an algorithm that modifies the multiplier by adding/removing in a power of 2 should a new ai be added in constant time to B(S), thus preserving the time speed-up when inserting or deleting, by using the table look up to update ts. A caveat is the L<= O(log1/5 n) as a bound on the multiplier to prevent an overly large multiplier.

The decoder then should be written such that it can map the values of the extracted field after multiplication to each edge in Tree(zs ). We construct a table to map this, such that table-lookup can find the leaf in constant time.

Conclusion and Thoughts

As Tree(zs ), i, rankB(S)(msb(u,ui )) and the comparison between u and ui can determine ranks(u) pre-processing can be done such that a table can be constructed to accurately map these values to ranks(u). Table-look up can be done in constant time, such that finding the rank, and subsequently the successor for the purpose of fractional cascading, is constant as well.

The algorithms here however are dependent on |B(S)| and shortcuts through the use of boolean instructions. Although the space complexity is not large, the computations for pre-processing might make this too complicated to be practical. The authors of the paper have acknowledged that it was done from a theoretical perspective.