Validate binary tree as BST
1. Problem statement
Check if a given binary tree is also (already) a binary search tree (BST) (Aziz et al., 2018, p. 213).
2. Insights
2.1. Acceptable ranges
Each node holds a value, and there is a special relationship between the parent and its left and right children. Namely, the left child cannot hold any value that is greater than the parent. Likewise, the right child cannot hold a value that is smaller than the parent value. These ranges (called "intervals" in math) must be respected by all subsequent child nodes.
Figure 1: Ranges for BST subtrees
In Figure 1, node A can contain nodes from negative infinity (assuming we have infinite memory) all the way up to 24. Similar rules apply for all other nodes. We can summarise the acceptable interval ranges as follows:
\begin{align*} A &= ({-\infty},{25}) \\ B &= (25,50) \\ C &= (50,75) \\ D &= (75, {\infty}). \end{align*}In Aziz et al. (2018, p. 213), the authors use closed interval ranges, such that the above would instead look like this:
\begin{align*} A &= ({-\infty},{25}] \\ B &= [25,50] \\ C &= [50,75] \\ D &= [75, {\infty}). \end{align*}The square brackets denote that the number it is next to is included in the range. For example, assuming we're using integers and not real numbers, \([25,50]\) means the range [25, 26, …, 50], whereas \((25,50)\) means the range [26, 27, …, 49].
But, we don't use this definition because, for simplicity, we assume that our BST does not allow the same key to appear on different nodes (i.e., duplicate keys are not allowed in our BST).
3. Solution
We'll be using our binary tree implementation to solve this problem.
3.1. Brute force?
In Aziz et al. (2018, p. 213), the authors describe a brute force approach where we essentially pick out the maximum or minimum values in each node of the tree (treating each node as its own independent binary tree), and then check that those minimums and maximums are in line with the value at the corresponding root node.
Ease of implementation, and the ensuing trust that the code is obviously free of bugs, is the most important consideration for any brute force solution.
This could be considered "brute force" but it seems overly contrived. The problem with this approach is that it appears more difficult to code than the better alternatives presented in the text. Therefore we skip implementing this solution.
3.2. DFS
3.2.1. By recording traversal history
A fundamental property of BSTs is that if you do an in-order traversal (depth-first search (DFS) where we visit the left child, current node, then the right child, recursively), you'll get keys in sorted order. So then we can do an in-order traversal, checking if we are visiting nodes in equal or increasing value than the previously visited node.
We write x.val
to get the "key" of the node, because this is a regular
BinaryTree
, not a BinarySearchTree
that has both key
and val
fields. For
a discussion about why we need keys and values for BSTs but not regular binary
trees, see the discussion about
binary search trees.
The time complexity is \(O(n)\) where \(n\) is the number of nodes in the tree. This
version needs additional \(O(n)\) space complexity because it stores all nodes it
visits before doing the check with is_sorted()
.
3.2.2. Without recording traversal history
This solution improves on the space complexity, by using a fixed last
variable. So the space complexity drops to just \(O(1)\).
Sadly, problem with both dfs_inorder_manual()
and dfs_inorder()
is that they
will traverse the left subtree completely before traversing the right subtree.
So if the problematic node is near the top of the tree on the right, we might
potentially waste a lot of CPU cycles before detecting the (obvious) issue.
3.2.3. Using ranges
As mentioned in 2, the key to this problem is to check if each node satisfies the range constraints imposed by the parent nodes. We can do this with a depth-first search (DFS) by recursing down the subtrees and checking that the nodes encountered fall within the expected range. As we descend down the tree, we continually refine our intervals as needed. When we are done traversing the entire tree, we're done.
The DFS should be a preorder traversal, because we should check the node that we are currently on as soon as possible before descending down to the subtrees.
While we are using the ranges idea here, we still perform DFS where the left subtree is computed first. And so it suffers from the same issues described for the DFS solution described in 3.2.2 above.
3.3. Optimal (BFS)
Using breadth-first search (BFS), we can check each level of the binary tree, top to bottom. For each node we visit, we just check that its key fits within the range that we expect it to be in.
We also simplify the code a bit by using floats instead of integers, which lets
us use negative and positive infinity. This avoids the extra stanza we need for
the root node if we're only using integer types and using None
to represent
these infinite values.
Time complexity is still \(O(n)\), but we solve the problem of traversing the left subtree needlessly in situations where we only need to check something in the right subtree.
4. Tests
4.1. Basic tests
4.2. Property-based tests
First we need a converter that can convert a BinarySearchTree
into a
BinaryTree
.
By using our binary search tree implementation, we can confidently generate any random number of BSTs, convert them to regular binary trees, and then check whether they pass our validation functions.
Let's check that our converter works, by doing the same preorder traversal and checking that we get the same values (keys for BSTs) back for both trees.
Generate random BSTs, and convert them to binary trees. Our validation functions are all expected to pass for these inputs.
Now let's check the opposite. Construct BSTs. Convert them to binary trees. But then tweak one of the nodes in the tree at random to be out of line to break the BST property. All such trees should fail BST validation.
Note that breaking the BST property involves choosing a very high or very low value. Even if we know that our BST keys are all positive numbers, we cannot always choose a minimum value like "-1" because if we assign this value to the leftmost child, the tree will still be valid.
Note that all BSTs generated here should be at least 2 nodes, because the root node can be of any value.