Extreme Recursion with Binary Tree Topologies

How many binary tree topologies are possible given n nodes?

Problem Statement

Write an algorithmic function that tells me how many binary tree topologies are possible, given n nodes in the tree.

A binary tree topology is defined as any configuration of the tree, given a number of nodes, that is agnostic of node labels. So for example, given two nodes, you can only make two binary topologies: one with a root and a child on the left, and another with a root and a child on the right. In both topologies, the node in the root and its child are interchangeable without increasing the number of topologies.

Hint: f(3) is 5

Evaluation

  • Converting the problem to a recursive definition that works up to n=3 (25%)
  • A completely correct recursive definition that works for an arbitrary n (25%)
  • Optimize your solution given reasonable resources (25%)
  • What is the time and space complexity of the optimized solution? (25%)

Solution

A one-liner solution

One-liner solutions in Ruby are my favorite, especially to such complex problems:

def topology_count(n)
  n.zero? ? 1 : (0..n-1).map do |i|
    topology_count(i) * topology_count(n-i-1)
  end.reduce(0, &:+)
end

For those unfamiliar with Ruby or recursion, let me explain what the above does. It says if I'm given zero nodes, there's only one tree topology possible (i.e. The tree of "no tree!").

Otherwise, we'll do recursion as follows: We'll implicitly assume that we have a node fixed as the root of the n-node tree. So we have n-1 nodes left. The root can have a left-subtree topology with as few as zero or as many as n-1 nodes. We'll consider each of these situations, where the left-subtree has i nodes ('i' ranging from 0 to n-1)

If I have a root and my left sub-tree has i nodes, then my right sub-tree has to have n-i-1 nodes, for the whole tree to have n nodes. And I need to multiply the number of configurations possible on the left and the number of configurations possible on the right to get the total possible configurations for that particular i. And I need to take the sigma (sum) for all the possible values of i. That last "reduce" step with a plus operator is summing all the values.

This answer will test as follows:

> topology_count(0)
 => 1 
> topology_count(1)
 => 1 
> topology_count(2)
 => 2 
> topology_count(3)
 => 5 
> topology_count(4)
 => 14 
> topology_count(5)
 => 42 
> topology_count(6)
 => 132 
> topology_count(7)
 => 429 
> topology_count(8)
 => 1430 
> topology_count(9)
 => 4862 
> topology_count(10)
 => 16796 
> topology_count(100) ^C ^C ^C
# Had to abort because it would take FOREVER (more than a few minutes).

Obviously this is suboptimal.

How would you optimize this solution?

Guess how different the optimal solution is? We simply Memoize it. Here goes:

def topology_count(n, memo = Hash.new)
  memo[n] ||= n.zero? ? 1 : (0..n-1).map do |i|
    topology_count(i, memo) * topology_count(n-i-1, memo)
  end.reduce(0, &:+)
end

Memoization is a way of avoiding recalculation. If you expand the mathematical formula for calculating f(5), you realize you're calling f(4), f(3), f(2), f(1) and f(0) many times. To avoid the expensive recalculations, we simply keep a hash table around (called memo) and upon calling f(n), first check if we had calculated it before. If not, we calculate it and assign the value to the key n inside the table.

Now let's see how long it takes to calculate the number of trees with 100 nodes:

> topology_count(100)
 => 668231847014668019694558244358915428258266349406867439688
# BOOM! Less than a second

> topology_count(500)
 => 181950390092306198065096450225765585041829137662760876156769862811466929740618445434444657497756291668167556809518811030679915659771619653364826038171140299439640378727262057091419481283290258430737116280661704440160851832060392419105021771165045783108705421738358182051266692370876335963202410816 
# Only 1 second!!!

> topology_count(1000)
SystemStackError: stack level too deep
# ... Oh well. Win some, lose some :-)

Interviewer's perspective

  • The weak candidate spends half the time solving a problem he doesn't understand. A simple clarification of what a "topology" means and running through a few cases made of a few number of nodes with the interviewer would eliminate the confusion. But for some reason, the moment weak candidates hear a word or definition they don't understand, they attempt to hide their not knowing it. Not knowing a fact is completely okay in a technical setting. These questions are about problem solving abilities. In the real work place, we'd want every candidate to freely admit that they don't know x, y and z. Instead, the ability you're trying to demonstrate is that you can collect all the facts and clarify all the assumptions before jumping into the work.
  • The intermediate candidate for some reason jumps into recursion without re-examining how to set up a recursive solution. I've seen at least 50% of the time that candidates start their thought process with "Okay, we have a tree of K nodes. How many ways can we add a node to that?" They completely ignore the inductive reasoning that recursion starts with Base Cases first. Start with the simplest cases of 1, 2 and 3 nodes, and the inductive (recursive) step should stare you in the face (eventually!). It is not "how do I add a node", but rather "given a node as root, how do I add subtrees?"
  • The strong candidate spends quite a bit of time walking through base cases. They make sure that they understand why given 3 nodes, 5 topologies are possible. Very slowly, they move their understanding to an attempt to solve for 4 nodes, and quickly realize it is very hard to "add" nodes, because despite knowing f(n-1), you still have to calculate how many ways the new node can attach to the k-node tree, which is non-trivial. At this step, they go into a 10 minute thoughtful reflection. Half of them get lost in the depths of thought and need rescuing a little. But I'd rather see them thoughtful than forcing through with the wrong insight. Eventually they take a step back and repeat the question to themselves: "I know how to make trees with 3 nodes. I want to make trees with 4 nodes" and suddenly it hits them: "Don't add a node! Keep the root fixed. It's a combination of a root, a left subtree and a right subtree!" -- That's the key insight of the whole thing.

Code Correctness and Elegance

In the end, the testing matters. It turns out that some strong candidates still end up trying to solve for a 4-node tree; they do a root, a left 1-node subtree and a right 2-node subtree. Then they decide there's a sigma for all the left-subtree sizes, but for example, forget to multiply the combinations possible on the left side with the combinations possible on the right side.

Calculation errors are completely okay, as long as the candidate starts testing for base cases and checks f(1), f(2), f(3), f(4), etc.

Another key observation is that once the recursive step is understood, some candidates come up with all sorts of conditional hacks to account for the situation where one side of the tree has a zero-sized subtree. The best of the strong candidates simply tell you "I'm dictating that f(0) is 1 as a base case". This is very telling in their level of understanding code elegance. Extra if conditions and unnecessary base cases are evil. All one needs for this recursion is a base case for n==0.

References

Lars, a Google engineer and a University of Waterloo alumnus, asked me this during a two hour Google screening interview on the phone. I had to code it live in a shared Google Doc in 20 minutes.

Recommended reading

Author
Amin Ariana
Published
January 2011