Algorithms‎ > ‎Trees‎ > ‎

LCA of Binary Tree


        _______3______
       /              \
    ___5__          ___1__
   /      \        /      \
   6      _2       0       8
         /  \
         7   4

If you are not so sure about the definition of lowest common ancestor (LCA), please refer to my previous post:Lowest Common Ancestor of a Binary Search Tree (BST) or the definition of LCA here. Using the tree above as an example, the LCA of nodes 5 and 1 is 3. Please note that LCA for nodes 5 and 4 is 5.

Hint:
Top-down or bottom-up? Consider both approaches and see which one is more efficient.

A Top-Down Approach (Worst case O(n2) ):
Let’s try the top-down approach where we traverse the nodes from the top to the bottom. First, if the current node is one of the two nodes, it must be the LCA of the two nodes. If not, we count the number of nodes that matches either p or q in the left subtree (which we call totalMatches). If totalMatches equals 1, then we know the right subtree will contain the other node. Therefore, the current node must be the LCA. If totalMatches equals 2, we know that both nodes are contained in the left subtree, so we traverse to its left child. Similar with the case wheretotalMatches equals 0 where we traverse to its right child.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Return #nodes that matches P or Q in the subtree.
int countMatchesPQ(Node *root, Node *p, Node *q) {
  if (!root) return 0;
  int matches = countMatchesPQ(root->left, p, q) + countMatchesPQ(root->right, p, q);
  if (root == p || root == q)
    return 1 + matches;
  else
    return matches;
}
 
Node *LCA(Node *root, Node *p, Node *q) {
  if (!root || !p || !q) return NULL;
  if (root == p || root == q) return root;
  int totalMatches = countMatchesPQ(root->left, p, q);
  if (totalMatches == 1)
    return root;
  else if (totalMatches == 2)
    return LCA(root->left, p, q);
  else /* totalMatches == 0 */
    return LCA(root->right, p, q);
}

What is the run time complexity of this top-down approach?

First, just for fun, we assume that the tree contains n nodes and is balanced (with its height equals to log(n) ). In this case, the run time complexity would be O(n). Most people would guess a higher ordered complexity than O(n) due to the function countMatchesPQ() traverses the same nodes over and over again. Notice that the tree is balanced, you cut off half of the nodes you need to traverse in each recursive call of LCA() function. The proof that the complexity is indeed O(n) is left as an exercise to the reader.

What if the tree is not necessarily balanced? Then in the worst case the complexity could go up to O(n2). Why? Could you come up with such case? (Hint: The tree might be a degenerate tree).

A Bottom-up Approach (Worst case O(n) ):
Using a bottom-up approach, we can improve over the top-down approach by avoiding traversing the same nodes over and over again.

We traverse from the bottom, and once we reach a node which matches one of the two nodes, we pass it up to its parent. The parent would then test its left and right subtree if each contain one of the two nodes. If yes, then the parent must be the LCA and we pass its parent up to the root. If not, we pass the lower node which contains either one of the two nodes (if the left or right subtree contains either p or q), or NULL (if both the left and right subtree does not contain either p or q) up.

Sounds complicated? Surprisingly the code appears to be much simpler than the top-down one.

1
2
3
4
5
6
7
8
Node *LCA(Node *root, Node *p, Node *q) {
  if (!root) return NULL;
  if (root == p || root == q) return root;
  Node *L = LCA(root->left, p, q);
  Node *R = LCA(root->right, p, q);
  if (L && R) return root;  // if p and q are on both sides
  return L ? L : R;  // either one of p,q is on one side OR p,q is not in L&R subtrees
}

Notes:
The LCA problem had been studied extensively by many computer scientists. There exists efficient algorithms for finding LCA in constant time after initial processing of the tree in linear time. For the adventurous reader, please read this article for more details: Range Minimum Query and Lowest Common Ancestor in Topcoder.

for the Bottom-up Approach, if p is not in the tree, but q is, it returns the root as LCA, which does not make sense. Should it return null in this case?

Further Thoughts:
What if each node in the binary tree has a link to its parent? Could you devise a non-recursive approach without using extra space?




Note:
This is Part II of Lowest Common Ancestor of a Binary Tree. If you need to find the lowest common ancestor without parent pointers, please read Lowest Common Ancestor of a Binary Tree Part I.

        _______3______
       /              \
    ___5__          ___1__
   /      \        /      \
   6      _2       0       8
         /  \
         7   4

If you are not so sure about the definition of lowest common ancestor (LCA), please refer to my previous post:Lowest Common Ancestor of a Binary Search Tree (BST) or the definition of LCA here. Using the tree above as an example, the LCA of nodes 5 and 1 is 3. Please note that LCA for nodes 5 and 4 is 5.

In my last post: Lowest Common Ancestor of a Binary Tree Part I, we have devised a recursive solution which finds the LCA in O(n) time. If each node has a pointer that link to its parent, could we devise a better solution?

Hint:
No recursion is needed. There is an easy solution which uses extra space. Could you eliminate the need of extra space?

An easy solution:
As we trace the two paths from both nodes up to the root, eventually it will merge into one single path. The LCA is the exact first intersection node where both paths merged into a single path. An easy solution is to use a hash table which records visited nodes as we trace both paths up to the root. Once we reached the first node which is already marked as visited, we immediately return that node as the LCA.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Node *LCA(Node *root, Node *p, Node *q) {
  hash_set<Node *> visited;
  while (p || q) {
    if (p) {
      if (!visited.insert(p).second)
        return p; // insert p failed (p exists in the table)
      p = p->parent;
    }
    if (q) {
      if (!visited.insert(q).second)
        return q; // insert q failed (q exists in the table)
      q = q->parent;
    }
  }
  return NULL;
}

The run time complexity of this approach is O(h), where h is the tree’s height. The space complexity is also O(h), since it can mark at most 2h nodes.

The best solution:
A little creativity is needed here. Since we have the parent pointer, we could easily get the distance (height) of both nodes from the root. Once we knew both heights, we could subtract from one another and get the height’s difference (dh). If you observe carefully from the previous solution, the node which is closer to the root is alwaysdh steps ahead of the deeper node. We could eliminate the need of marking visited nodes altogether. Why?

The reason is simple, if we advance the deeper node dh steps above, both nodes would be at the same depth. Then, we advance both nodes one level at a time. They would then eventually intersect at one node, which is the LCA of both nodes. If not, one of the node would eventually reach NULL (root’s parent), which we conclude that both nodes are not in the same tree. However, that part of code shouldn’t be reached, since the problem statement assumed that both nodes are in the same tree.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
int getHeight(Node *p) {
  int height = 0;
  while (p) {
    height++;
    p = p->parent;
  }
  return height;
}
 
// As root->parent is NULL, we don't need to pass root in.
Node *LCA(Node *p, Node *q) {
  int h1 = getHeight(p);
  int h2 = getHeight(q);
  // swap both nodes in case p is deeper than q.
  if (h1 > h2) {
    swap(h1, h2);
    swap(p, q);
  }
  // invariant: h1 <= h2.
  int dh = h2 - h1;
  for (int h = 0; h < dh; h++)
    q = q->parent;
  while (p && q) {
    if (p == q) return p;
    p = p->parent;
    q = q->parent;
  }
  return NULL;  // p and q are not in the same tree
}

Further Thoughts:
Isn’t the solution above neat? It requires some creative thought and is quite subtle.

A variation of this problem which seemed to be more popular is:

Given two singly linked lists and they both intersect at one point (ie, forming a Y shaped list). Find where the two linked lists merge.

I am sure you know how to answer this question now :)





Comments