An Introduction to Tango Trees

Introduction

Tango Trees are a specialized form of Binary Search Tree (BST). When originally proposed in 2004 by Demaine, Harmon, Iacono, and Patrascu, they were the first (and only) progress on achieving dynamic optimality in almost 20 years. You can read the original paper here: http://erikdemaine.org/papers/Tango_SICOMP/paper.pdf

Tango Trees achieve a competitive ratio of O(lg lg n).

It is important to note that Tango Trees do not achieve an O(lg lg n) runtime, but rather an O(lg lg n)-competitive ratio. The concept of dynamic optimality and competitive runtimes is explained in more detail here.

If you need a refresher on basic BSTs, click here.

If you need a refresher on Red Black Trees, click here.

Although not strictly necessary to understand the content on this page, Tango Trees have many similarities to Splay Trees, which you can read about here.

Tango Trees

Tango trees are implemented using underlying red black trees. For simplicity, in the discussion below we will assume we have a static dataset, and we are only searching. Since tango trees are implemented using underlying Red Black Trees, we can easily extend the insertion and deletion operations of Red Black trees to Tango trees.

Additionally, the ability to store (and maintain) black heights and depths in each node carries over trivially from Red Black Trees. It is important to note, however, that we are keeping track of the depths in the reference tree, not the depth in the Tango tree.

The Reference Tree

In order to understand tango trees, the concept of a reference tree is used. For simplicity, we assume a perfectly balanced BST, but any reasonably balanced BST will work just as well. The reference tree does not actually show up in the implementation, but is helpful for explaining the concepts. Also for simplicity, we assume that we are working with integers, and that we just have every integer from 1 up to n (the number of nodes).

Preferred Paths

For each node, let its "preferred child" be the node that was accessed most recently. In the reference trees below, preferred edges are marked in red. In other words, in the tree below, 6 is the preferred child of 4. If we then access 2 (even if just passing through on our way to 1), then 2 would become the preferred child of 4.

Any trail of red edges is called a preferred path. So 9, 10, and 12 are all on the same preferred path, as are 8, 4, 6, and 5.

Structure of a Tango Tree

In order to build a tango tree, we take the elements on each preferred path and put them into their own Red Black tree. We call these auxiliary trees going forward.

Since the number of nodes in any auxiliary tree is at most the height of the reference tree, there are O(lg n ) nodes in any given auxiliary tree. Therefore, the maximum height of any auxiliary tree is O(lg lg n).



In every node of every auxiliary tree, we store the following information:

  • The depth of the node in the reference tree. For instance, 8 would store depth d = 0, and 7 would store depth d = 3.
  • We also store the minimum and maximum depth along this preferred path in each node of the auxiliary tree. So every node in the 4, 5, 6, 8 auxiliary tree contains min_depth = 0, and max_depth = 3. Every node in the 9, 10, 12 auxiliary tree contains min_depth = 1, and max_depth = 3. Each of the singleton trees stores min_depth = max_depth = 3.
    • We store max_depth because we may not have a complete BST. For instance, we might not have nodes 11, 13, and 15, which would affect the max_depth in those auxiliary trees.

Constructing the tree

Put the auxiliary tree containing the root of the reference tree at the root of the tango tree. Then for every non-preferred edge which is incident to the preferred path, hang that auxiliary tree off of that tree where it would go in a normal BST. (So the trees containing 2, 7, and 12 should be hung off of the tree at the root.)

Do this recursively for each preferred path, until you have a single structure.

Although multiple shapes are possible (for instance, we could rotate 4 and 5; or we could move 5 to the root, set 8 as the right child of 5, and set 6 as the left child of 8), one potential tango tree is shown below.

Reference Tree

Tango Tree

Updating the Tango Tree

Now that we have built a tango tree, we want to be able to update it. Similar to splay trees, we want to move items in the main preferred path to the auxiliary tree at the root.

Suppose we search for an arbitrary node. Every time we hit a non-preferred edge, we will perform an operation that we will call a "tango."

A tango consists of two (or fewer) cuts, and two joins.

Recall that each node contains its own depth, as well as the minimum and maximum depth of its auxiliary tree. Note, as well, that all nodes below a certain depth on any particular path fall inside a particular range. As this is perhaps easier to see in a larger tree, one is provided here.

The Red Black tree of the largest preferred path might look like either of these:

All nodes with at least a certain depth in the reference tree could be extracted into their own red-black tree by splitting their current auxiliary tree one or two times. (For a refresher on splitting red-black trees, click here.)

  • d = 4; values = [5]; split at 4.5 and 5.5
  • d = 3; values = [5, 6]; split at 4.5 and 7
  • d = 2; values = [4, 5, 6]; split at 7
  • d = 1; values = [4, 5, 6, 8]; split at 12

Tango Operations

We can grab the necessary depths from the tree being moved into the root auxiliary tree. Suppose we want to access 2 in the tree at right. The tree containing 1 and 2 would store a min_depth of 2, and a max_depth of 3. Therefore, we can perform two cuts to remove everything of the same depth from the root auxiliary tree.


A Tango Operation

Reference Tree

Tango Tree

Then, we join the two outside trees back into one tree and then merge the tree containing 4 and 8 to the tree containing 1 and 2. Finally, hang the tree containing 6 and 5 off of the root tree.

Since the height of each of these trees is O(lg lg n), and cuts and joins can be done in O(height), each of these operations take O(lg lg n) time. We therefore do O(4 * lg lg n) = O(lg lg n) work for every non-preferred edge that we traverse in the tree.

Therefore, the overall runtime for a single tango operation is O(lg lg n).

The resulting reference tree is below, and the resulting tango tree is to the right.

For explicitness, we will define a Tango operation as Tango(T, y), where T is the auxiliary tree containing the root of the tango tree, and y is the auxiliary tree which we want to merge. Note that even when doing a search that requires multiple tango operations, each will be done sequentially, so we will always be cutting things out of the root tree, and merging with the root tree.

Further Examples

Now suppose that we want to access 3. We perform a single tango operation to cut 2 out of the root auxiliary tree and merge 3's auxiliary tree. the resulting tango tree is to the right.

Reference Tree

Tango Tree

A Quick Note on Tango Tree shapes

As mentioned before, many shapes are possible for the Tango tree. For instance, with the reference tree just above, we could end up with a tree like the one at right. The root auxiliary tree is still a valid red black tree. To make it simple to go from step to step, we will keep the root tree shape the same throughout. However, we felt it important to point out that the tree shapes presented are not the only ones possible.

Accessing 15

Tango(T, tree-rooted-at-10)

Perform a tango with the tree rooted at 10. We cut out everything below 8 in the reference tree. (We can see that this can be done with a single cut in the tree above.) Then we merge 8 with the tree rooted at 10.

Affix the auxiliary trees as you would in a normal BST.

Reference Tree

Tango Tree

Tango(T, tree-rooted-at-14)

Perform a tango with the tree rooted at 14. We cut out everything below 10 in the reference tree. (We need two cuts, at 8.5 and 11.) Then we merge the root auxiliary tree with the tree rooted at 14.

15 is in the root auxiliary tree, so the Tango tree property has been restored.

The overall runtime for this search is O(2 * lg lg n).

Reference Tree

Accessing 12, 15, 14, 15

This sequence of operations results in no tango operations, since all nodes are already in the root auxiliary tree. Therefore a search for these nodes takes O(lg lg n).

Accessing 7

Accessing 7 in the tree above, however, results in 3 tango operations.

Runtime

The runtime of a single search in a Tango Tree is bounded by the number of non-preferred edges traversed (call this number E) times the amount of work done in a single Tango operation. This equals O(E * lg lg n).

Another way to describe the number of non-preferred edges is the Interleave Lower Bound, also known as the Wilber 1 bound. The Interleave Lower Bound is also described in Demaine, Harmon, Iacono, and Patrascu's paper introducing Tango Trees.

If we can find a tree which requires some constant factor times the interleave lower bound (E) in order to access any arbitrary element, we will have discovered OPT. However, Tango Trees have a runtime of O(E * lg lg n) = O(OPT * lg lg n) = O(lg lg n)-competitive.

What about insertion and deletion?

We perform a tango operation any time that we access an element - the type of access does not matter. If we want to insert an element, we will tango elements towards the root (following the usual search path) until the only option is to insert into the root tree.

For instance, suppose we wanted to insert 8.5 into the tango tree above (the one with 7 at the root). We would perform 2 tango operations, resulting in 8 and 9 both being in the root auxiliary tree, and then insert 8.5 into the root auxiliary tree. We still end up doing an amount of work that is bounded by the number of non-preferred edges traversed times the amount of work done in a Tango operation. This equals O(E * lg lg n).

Extending this argument to deletion is trivial, as you perform the same search, and then delete.

So What?

It is conjectured that Splay Trees are optimal, but this conjecture is currently unproven.

Tango Trees represent the first (and only) real progress in the search for OPT since the problem was put forth in the mid-1980s, and there has been very little progress since they were introduced in 2004. However, they improved on the previous best known/proven competitive ratio of O(lg n), held by Red-Black Trees and AVL Trees.