Red Black Trees

Sample Implementation

A polymorphic C implementation of Red Black trees is available here: https://github.com/Tyresius92/red-black

Coloring a Red Black Tree

A red black tree is colored following a certain set of rules. These will be called the "Red Black properties" going forward.

  1. All nodes are either red or black.
  2. The root of the tree is always black.
  3. We add black "dummy" leaves so that every "real" node has 2 children
  4. A black node may have a parent of either color, but every red node has a black parent.
  5. All paths from the root to the leaves must have the same number of black nodes (including the dummy nodes).

Black Height

It is frequently useful to talk about Red Black trees in terms of the number of black nodes leading down to the NIL pointers. We call this the black height. We will annotate the black height of a particular node or tree as BH(X).

In the tree above:

  • All of the NIL nodes have a black height of 1.
  • 15 has a black height of 2 (NIL, plus itself).
  • 6 has a black height of 1 (just the NIL).
  • 13 has a black height of 3 (NIL, 1, and itself; or any other path leading down to the NILs)
  • 17 has a black height of 2 (NIL, and 25; or NIL and 15)

The black height of each node can be stored and modified for trivial additional constant cost during all operations described below.

Searching a Red Black Tree

Searching in a Red Black Tree follows exactly the same pattern as a regular BST. Left if it's smaller than the current node, right if it's bigger. If you hit a NIL (aka a dummy node), the item is not in the tree.

Runtime

O(lg n)

Insertion into a Red Black Tree

Insertion is similar to insertion into a regular BST. Simply insert the new value into the tree as a red node. If it creates a violation of the Red Black properties, run the red black insertion fixup algorithm. (This algorithm is described in CLRS, chapter 13, section 3. The pseudocode is available on page 316 of the third edition.)

You can also check out the GeeksForGeeks Tutorial which includes an implementation in C++.

Runtime

It can be shown that the insertion fixup algorithm has to traverse at most the height of the tree, which is known to be O(lg n). It also performs a constant number of rotations (at most 2). This algorithm therefore runs in O(lg n) time.

Deletion from a Red Black Tree

Deletion is similar to deletion from a regular BST. However, like insertion, it may cause a violation of the red black properties. If it does, run the red black deletion fixup algorithm. (This algorithm is described in CLRS, chapter 13, section 4. The pseudocode is available on page 326 of the third edition.)

There is also a description in this GeeksForGeeks Tutorial.

Runtime

It can be shown that the deletion algorithm also traverses at most the height of the tree, which is known to be O(lg n). It also performs at most 3 rotations. Therefore, the algorithm runs in O(lg n) time.

Joining two Red Black Trees

Consider two red black trees, T1 and T2, such that all keys in T1 are less than T2. You are also given a key K that is between MAX(T1) and MIN(T2). If no key is given, simply remove MIN(T2) from T2 and let K be equal to that.

Trivial Join

Suppose BH(T1) equals BH(T2). Simply color K black and set T1 as the left child of K, T2 as the right child of K, and you're done.


Non-Trivial Join

Without loss of generality (since the reverse case is symmetric), suppose that the BH(T1) > BH(T2).

In T1, start at the root and go right until you find a black node N such that BH(N) equals BH(T2).

Color K red. Set K as the right child of N's parent. Then set N as K's left child, set T2 as K's right child. If K's (new) parent is red, fix violations as you would when inserting.

In this example, T2 has a black height of 2, so we search for the node on the rightmost path (sometimes called the "spine") of T1 that has black height of 2. That node is 11.

So, we set 11 as the left child of 13, and we set 15 as the right child of 13. Finally, set 13 as the right child of 7 (because it is 11's old parent). If 7 were red, we would also have to fix red-red violations, as in insertion.

Runtime of joining Red Black Trees

The trivial case takes constant time, as it is simply setting a few pointers. O(1)

The non-trivial case could take O(lg n) = O(height) to find the node with the appropriate black height in the worst case, and O(lg n) = O(height) to fix any violations created as a result. Overall runtime: O(lg n) + O(lg n) = O(lg n) = O(height)

More specifically, the time to find the appropriate node and fix violations is proportional to the difference in black heights.

Splitting a Red Black Tree

We would like to be able to split a Red Black tree on a particular value V, to end up with two trees - one containing all values smaller than V.

For simplicity, suppose that V is not currently in the tree. In our example, let V = 10.





Insert V into the tree, and mark each edge going down the tree.




Break the tree into pieces along every marked edge.

You will be left with a number of "pennants." These are valid (or almost valid) Red Black trees, that also happen to have an arbitrarily colored node above the root.

Ignoring the one offshoot (7, 9, 11, and 13), recolor all of the roots (3, 8, 12, and 15) black if necessary, and then you have valid Red Black Trees.







Without loss of generality, focus on those trees with values smaller than V.

The tree rooted at 3 can have no values larger than 7, by the BST property. Additionally, the tree rooted at 8 must have values larger than 7, or those values would have gone to the left of 7 when being inserted.

So, use the offshoot (7) to join those two red black trees. Do this repeatedly until you are left with only one tree, and one offshoot remaining. In our example, that value is 9.

Finally, insert your leftover offshoot value (9) into your red black tree.

Finally, you end up with 2 trees, split at 7.

Runtime of Splitting a Red Black Tree

The initial insertion of V and marking of edges takes O(lg n).

Each merge takes O(lg X) where X is the subtree size. As mentioned previously, the time needed in merge to find the appropriate node and fix violations is proportional to the difference in black heights. If you prioritize merging smaller trees, then you can add up all of the differences in black height, and you will get at most log n. All of the merges therefore take O(lg n).

Overal runtime: O(lg n) + O(lg n) + O(lg n) = O(lg n) = O(height of the tree)