Advanced Tutorial I: Range Queries

Post date: May 28, 2012 5:57:38 AM

Suppose you have n bags numbered from 1 to n, each of which contains some marbles. Using an integer array a, the amount of marbles in the i'th bag can be stored as a[i]. If you want to find the total number of marbles in some range, say a[i..j], you could compute the sum a[i]+a[i+1]+...+a[j]. A more efficient way to handle such queries is with a Fenwick Tree, also known as a Binary Indexed Tree: http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees.

More generally, suppose you have an associative binary operation, which we'll denote by @. For example, x@y could mean x+y or perhaps min(x,y). Being associative means that x@(y@z) = (x@y)@z for any x,y,z. Thus, parentheses are redundant, and the expression can be abbreviated as x@y@z or @(x,y,z). In other words, the operation @ generalizes to any number of arguments. For completeness, we define the unary and nullary extensions of @: @(x) = x and @() = e where e is the operation's identity, e.g. 0 for addition, 1 for multiplication, +infinity for min and -infinity for max.

Now consider a similar problem where, given an array, you want to efficiently compute a[i]@a[i+1]@...@a[j]. Associativity allows a divide and conquer approach. Since parentheses can be placed any way we like, we split the expression into (a[i]@...@a[m]) @ (a[m+1]@...@a[j]) and recursively evaluate the two parts. This is the essence of the range query tree (sometimes known as a segment or interval tree at the risk of confusion with data structures that actually store intervals). The case where a@b = min(a,b) bears mentioning because it can be used to find the lowest common ancestor of a pair of nodes in a rooted tree. You can read more about this at http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lowestCommonAncestor.

The tree in the TopCoder tutorial gives more information than just the minimum value in a range: it returns the array index at which the minimum value can be found. This makes the implementation slightly more difficult and error-prone. However, consider the operation @ such that for any pair of indices (i,j), i@j is i or j depending on whether a[i] or a[j] is smaller. Then @ is associative! Now you can abstract away the operation by writing a helper function which takes parameters (i,j) and returns i@j. This operation's identity element can be a special flag, say -1, representing a non-existent array position. When initializing the tree to represent an empty array, all nodes should be set to the identity.

If the range query tree is strictly more general than the binary indexed tree, why use the latter? The two data structures are fundamentally similar. However, the existence of additive inverses implies that any range query in the BIT can be met using two prefix sums: a[i]+...+a[j] = (-a[i-1])+...+(-a[1]) + a[1]+...+a[j] = -(a[1]+...+a[i-1]) + (a[1]+...+a[j]). This simplifies dynamic updates since we can just add a newly inserted value to existing sums. The result is a concise implementation which extends to multiple dimensions (where the query regions are rectangles or boxes) for almost no extra work. Furthermore, the BIT in the UBC code archive contains a specialized function that efficiently computes order statistics, i.e. it can find the position of the n'th marble counting from left to right. A similar implementation on the range min query tree can be used to find the smallest index i at which a[i] is smaller than a given number. Details are left to the reader.

To compete at the highest levels, it's essential to understand advanced algorithms and data structures such as those in the UBC code archive: not only their blackbox usage but also how they work. You will need to modify them in order to solve some problems. The regionals and World Finals contain few "standard" problems, so you may have to get creative with your toolkit!