Binary Search Tree Basics

Sample Implementation

A polymorphic C implementation of a basic binary search tree is available here: https://github.com/Tyresius92/binary-search-tree

BST Basics

A Binary Search Tree (BST) is a data structure which has the following characteristics:

  • Given any node n in the binary search tree, all nodes to the left of n have a value less than n, and all values to the right of n have a value greater than n.
    • For instance, in the tree at right, all values less than 6 (i.e. 1, 3, and 4) are found to the left of 6, and all values to the right (i.e. 7, 8, 10, 13, and 14) are found to the right.
  • The topmost node in the BST is called the root. The node immediately down and to the left is called its left child, and the node immediately down and to the right is called its right child. You can find whether a value is in the tree by comparing it to the current node, and then going to its left or right child, accordingly.
    • For instance, to find if 5 is in the tree, you would first compare to 8. Since 5 < 8, you would go to 8's left child. Then you would compare 5 to 3, and go right. Then 5 is less than 6, so left again. You compare 5 to 4, and attempt to go right, only to find that you are out of nodes, so 5 is not in the tree.

Building a BST

Building a standard BST is similar to the search above. Simply follow the same path you would while searching for your number, and then insert it when you hit an empty spot. For instance, if inserting 5, it would become the right child of 4, which is where we fell off the tree when searching for 5.

Expected BST Runtime

In a relatively balanced BST (like the one at left, where all the nodes are as close to the root as possible), you would perform approximately O(lg n) comparisons (all logarithms are base 2). For instance, in the tree at left, there are 9 nodes. The log of 9 is ~3.16. We therefore expect to do 3 to 4 comparisons on average. In this tree, we end up doing 4 comparisons when searching for 5, but only 3 comparisons when searching for 2.

In order to build a BST, you have to do n insertions, and each one takes O(lg n) time on average. Therefore, the expected time to build a binary search tree is O(n lg n). Similarly, doing n searches also takes O(n lg n) time.

If you would like a more formal discussion of this information, there is a slide deck here, and a video lecture here.

If you need a refresher on runtimes, please click here.

BST Depth

The depth d of a node refers to how many levels you need to go through in order to reach the node. For instance, in the tree above, 10 has a depth of d = 2, since you have to go through 8 at the root, as well as 13 on the next level. Similarly, 7 has a depth of d = 3. We say that the root has a depth of zero (0).

You can store each node's depth as an additional bit of data in each node. It is trivial to maintain this data, even with rotations (discussed later).

Weaknesses of a standard BST

Standard binary search trees have a few weaknesses. First, although they have an expected runtime of O(log n) for a single search, they actually have a worst case runtime of O(n). Suppose that, when building the tree above, we had inserted the nodes in increasing order: 1, 3, 4, 6, 7, 8, 10, 13, 14. We would have put 1 at the root, then 3 as the right child of 1, then 4 as the right child of 3, and 6 as the right child of 4, and so on, and we would end up with just a single path of nodes, which is not any better than our original list.

Worst case runtime of building a BST

In the scenario presented in the previous paragraph, we end up building one long list, where we always go right. In order to insert the n-th element, we would end up doing n - 1 comparisons. This evaluates to n x (n - 1) = O(n^2) work. Searching and other operations take a similar amount of work, by the same arguments.

This runtime defeats the entire point of building the BST in the first place. It would be nice to have a way to ensure that we always end up with a balanced binary search tree instead of just one long list of elements.

Dynamic Binary Search Trees

With (most) dynamic BSTs, the goal is to keep the tree relatively balanced. We don't necessarily need a perfectly balanced tree, where all nodes have minimum depth. We just need it to be reasonably balanced. If the tree becomes unbalanced, rebalancing is accomplished using a series of rotations.

In a Red Black tree, we color each node red or black, and maintain several properties to ensure we remain reasonably balanced. You can read more about Red Black Trees here.

Splay trees have a completely different access pattern that does not care about balance, but they still rely on rotations. You can read more about Splay Trees here.

What is a rotation?

There are two basic rotations - left rotate and right rotate. Since they are symmetric, we will describe only one here.

Looking at the gif at right, there are three triangles. By definition, all items in the leftmost triangle are smaller than items anywhere else in this tree, and all items in the rightmost triangle are larger. All items in the middle triangle must be between the two circle items.

We can therefore manipulate the relationships between the nodes without having to worry about invalidating the BST properties.

In a left rotation, node L is at the root of the subtree, and R is L's right child. To perform the rotation, we make the left child of R (the middle triangle) the right child of L. Then we make L the left child of R, and finally, point L's parent at R.

Types of Dynamic BST

We will discuss the following kinds of Dynamic Binary Search Trees:

Red Black Trees

Splay Trees

Tango Trees