125 avltree

1 Introduction

1.1 How to measure the balance of a tree?

Besides red-black tree, are there any other intuitive solutions of self-balancing binary search tree? In order to measure how balancing a binary search tree, one idea is to compare the height of the left sub-tree and right sub-tree. If they differs a lot, the tree isn’t well balanced. Let’s denote the difference height between two children as below

δ(T) = |L|- |R |

Where |T| means the height of tree T, and L, R denotes the left sub-tree and right sub-tree.

If δ(T) = 0, The tree is definitely balanced. For example, a complete binary tree has N = 2h - 1 nodes for height h. There is no empty branches unless the leafs. Another trivial case is empty tree. δ(φ) = 0. The less absolute value of δ(T) the more balancing the tree is.

We define δ(T) as the balance factor of a binary search tree.

2 Definition of AVL tree

An AVL tree is a special binary search tree, that all sub-trees satisfying the following criteria.

|δ(T)| ≤ 1

The absolute value of balance factor is less than or equal to 1, which means there are only three valid values, -1, 0 and 1. Figure 1 shows an example AVL tree.

Figure 1: An example AVL tree

Why AVL tree can keep the tree balanced? In other words, Can this definition ensure the height of the tree as O(lg N) where N is the number of the nodes in the tree? Let’s prove this fact.

For an AVL tree of height h, The number of nodes varies. It can have at most 2h - 1 nodes for a complete binary tree. We are interesting about how many nodes there are at least. Let’s denote the minimum number of nodes for height h AVL tree as N(h). It’s obvious for the trivial cases as below.

  • For empty tree, h = 0, N(0) = 0;
  • For a singleton root, h = 1, N(1) = 1;

What’s the situation for common case N(h)? Figure 2 shows an AVL tree T of height h. It contains three part, the root node, and two sub trees A,B. We have the following fact.

h = max(height(L),height(R )) +1

We immediately know that, there must be one child has height h - 1. Let’s say height(A) = h - 1. According to the definition of AVL tree, we have. |height(A) - height(B)|≤ 1. This leads to the fact that the height of other tree B can’t be lower than h - 2, So the total number of the nodes of T is the number of nodes in tree A, and B plus 1 (for the root node). We exclaim that.

N (h) = N(h - 1)+ N(h - 2) + 1

Figure 2: An AVL tree with height h, one of the sub-tree with height h - 1, the other is h - 2

This recursion reminds us the famous Fibonacci series. Actually we can transform it to Fibonacci series by defining N(h) = N(h) + 1. So equation 4 changes to.

N′(h) = N ′(h - 1) +N ′(h- 2)

Lemma 2.1. Let N(h) be the minimum number of nodes for an AVL tree with height h. and N(h) = N(h) + 1, then

N ′(h) ≥ φh

Where φ = √5+1   2 is the golden ratio.

Proof. For the trivial case, we have

  • h = 0, N(0) = 1 φ0 = 1
  • h = 1, N(1) = 2 φ1 = 1.618...

For the induction case, suppose N(h) φh.

N ′(h + 1) = N ′(h)+ N ′(h - 1) {F ibonacci}           ≥ φh + φh-1                     √-           = φh-1(φ +1)       {φ + 1 = φ2 =-5+23}           = φh+1


From Lemma 2.1, we immediately get

h ≤ logφ(N + 1) = logφ(2)⋅lg(N + 1) ≈ 1.44lg(N + 1)

It tells that the height of AVL tree is proportion to O(lg N), which means that AVL tree is balanced.

During the basic mutable tree operations such as insertion and deletion, if the balance factor changes to any invalid value, some fixing has to be performed to resume |δ| within 1. Most implementations utilize tree rotations. In this chapter, we’ll show the pattern matching solution which is inspired by Okasaki’s red-black tree solution[2]. Because of this modify-fixing approach, AVL tree is also a kind of self-balancing binary search tree. For comparison purpose, we’ll also show the procedural algorithms.

Of course we can compute the δ value recursively, another option is to store the balance factor inside each nodes, and update them when we modify the tree. The latter one avoid computing the same value every time.

Based on this idea, we can add one data field δ to the original binary search tree as the following C++ code example 1.

template  <class  T >  
struct node{ 
  int delta; 
   T  key; 
  node* left; 
  node* right; 
  node* parent; 

In purely functional setting, some implementation use different constructor to store the δ information. for example in [1], there are 4 constructors, E, N, P, Z defined. E for empty tree, N for tree with negative 1 balance factor, P for tree with positive 1 balance factor and Z for zero case.

In this chapter, we’ll explicitly store the balance factor inside the node.

data AVLTree a  =  Empty 
               | Br (AVLTree a) a (AVLTree a) Int

The immutable operations, including looking up, finding the maximum and minimum elements are all same as the binary search tree. We’ll skip them and focus on the mutable operations.

Please read the PDF version for better format.
Xinyu LIU,
2011年8月14日 下午11:14
Xinyu LIU,
2011年8月14日 下午11:14