Binary trees
1. Problem statement
Implement a binary tree (Aziz et al., 2018, p. 123).
2. Insights
2.1. Similarity to linked lists
A binary tree is eerily similar to a linked list; whereas a node in the linked list can only point to a single "next" node, a node in a binary tree points to two other nodes, called the "left" and "right" children (or subtrees). Indeed, if a binary tree only has left nodes (or "right" nodes), then the structure devolves to a linked list.
As we will see, even though they are "similar" in the above sense, they have wildly different performance properties and require much more complicated algorithms than linked lists.
2.2. Relation to binary search trees (BSTs)
Binary search trees, or BSTs, are binary trees with additional ordering guarantees. Whereas BSTs are all about maintaining this ordering guarantee (basically that all children on the left node are smaller than all children on the right node), plain binary trees are concerned more with the actual structure (or position) of the nodes in relation to each other. This latter point is important because whereas there can be multiple different ways of representing the same data with BSTs (depending on the order in which the values are inserted into the BST), with plain binary trees the structure itself matters.
3. Solution
We could define Node
as a nested class inside the BinaryTree
class, but for
simplicity we choose not to. In a "production" implementation, we would have to
concern ourselves with API boundaries and encapsulation, but those concerns are
beyond the scope of our discussion. And so we choose to ignore them here.
We use a two-class scheme. The Node
class stores the links to the children
(and additional metadata about the node and its children). The BinaryTree
class has all of the interesting methods that act on the overall tree, because
it's expected that users will never have to deal with the Node
class directly.
3.1. Initialization
3.1.1. Node initialization
We set the left and right links to None
. These point to other Node
instances, if any. Later when we add more nodes into this tree, these links will
become populated.
We also have the val
field to actually store the value we want inside this
node, as well as the count
field to tell us how large the binary tree is (if
we tree the current node as the root).
3.1.2. Tree initialization
For creating the tree, we want to just make sure that there is either a single node, or nothing (an empty tree).
3.2. Insertion
For each value we want to insert, take a list of directions. These directions are a list of 0's or 1's and tell us to go either left or right (left is 0, right is 1), starting from the root node. When we run out of directions, insert the value at that navigated-to node.
If there are any nodes missing on the way to the final destination, create nodes along the way. This way, we don't force our callers to only create new nodes exactly at a particular leaf node (they could choose to, for example, create the leaf node first, followed by filling up the parent nodes).
The main workhorse is _insert()
, a private method (insert()
is meant for
external use).
We navigate down the tree if there are directions. As we do so, we make note of
all the nodes we've seen, adding them to hist
.
Note that it is important that we return the initial node (start
) we've
started from. This way we can assign it back into self.root
in insert()
.
3.3. Size
The size of the tree is the size of all the nodes in the tree, including its
children. We update the count
field every time we insert new nodes, including
the counts for all parents. As we shall see later, we also update the count
for all relevant nodes whenever we perform a deletion. This is why we don't
have to do any extra work when trying to figure out the size of the tree —
it's just a direct lookup of the x.count
field itself.
3.4. Lookup
Looking up a node is similar to an insertion operation, because we have to navigate to the node by following the provided directions. However, if there is an error at any point in navigating to the desired node, we raise an exception instead. From the caller's point of view, a value is only returned as a successful lookup if we were able to navigate to the desired node by following the provided directions.
In some sense this is a lot like a pointer defererence operation: we are merely checking that we aren't "dereferencing" (looking up) an invalid "address" (directions to get to the node).
3.5. Deletion
Deletion is tricky. Not only do we have to update the counts, but we could be moving nodes around in order to preserve the overall structure of the tree. Let's think about preserving the structure first (and see why this would even be a concern).
If we want to delete a node in the tree, we must consider 3 possible cases:
- the to-be-deleted node has 0 children (leaf node),
- the to-be-deleted node has 1 child, or
- the to-be-deleted node has 2 children.
The first case is easy — from its parent's point of view, we just set the
corresponding left or right link (pointer) to it to None
.
The second case requires us to promote the only child of the node to be deleted to be the replacement. It's just like deleting a node in a linked list (where the "next" node becomes the current node).
The third case is tricky. We have two child nodes (and they may also have children of their own), but we can only assign one child to be the replacement. Obviously we cannot make both children the replacement because then it won't be a binary tree any more.
Here we just do the simplest thing possible — we recurse down the left
subtrees until we are at a leaf node, and promote this one to be the
replacement. This we way don't have to affect the structure of the other
children that much. We could also recurse down the right subtree to get the
rightmost leaf node, or perhaps even pick a leaf node at random — but choosing
the leftmost child seems like the simplest thing and is what we do in
_delete()
.
Now it may very well be the case that getting the leftmost leaf node results in
getting the node that we started searching with. That is, if we have 2 children
and the start the "leftmost leaf node" search on the left child, it may be that
that child is already the leaf node. In that case the while
loop would break
immediately and we would return the argument we got (x
) as is.
As for the public method delete()
, it first tries to do a lookup of whether
the node to be deleted can be navigated-to. That is, it checks whether the node
we want to delete even exists. If it does not exist, then we abort and do
nothing, returning None
.
3.6. Traversal
Traversing a binary tree means visiting (performing some action on) each node in the tree exactly once.
For the different kinds of traversals, a simple way to remember is the timing of acting on the current node. If the node is acted upon first (before acting on the child nodes), it is called pre-order traversal. If it is acted on last (after acting on the child nodes), it is called post-order traversal. If it is acted on in-between the left and right children, it's called in-order traversal.
"In-order" traversal is named as such, because if the tree is a binary search tree (such that all values in the left subtree are less than or equal to the current node's value, and all values in the right subtree are greater than the current node's value), then we end up visiting the nodes in sorted order.
As all trees (binary ones included) are graphs, depth-first search (DFS) and breadth-first search (BFS) apply here as well. It turns out that the three types of traversal we covered above all fall into the category of DFS, because they are all concerned with visiting children recursively (and each recursive call is a descent down a level in the tree).
3.6.1. BFS
BFS is also possible. This is also called level-order traversal, because we visit all children at a level in the tree first before moving down to their children. Unlike DFS, BFS requires an auxiliary data structure to keep track of which nodes to visit, because we need a way to visit the nodes beyond the left and right links included natively in each binary tree node.
The version above needs to loop through nodes_at_current_depth
twice in each
outer while
loop iteration. The version below only uses a single for
loop
inside the while
loop, at the cost of needing to rename the
nodes_at_current_depth
variable to q
(it is no longer only holding nodes at
the current level, but instead at the current and next level as it gets mutated
in-place).
4. Tests
4.1. Initialization
4.2. Insertion
4.3. Lookup
4.4. Deletion
4.5. Traversal
For these traversals, we construct the following binary tree (the keys are integers and the values are just the string representations of the keys, and so these redundant values are omitted from the illustration)
4.6. Property based tests
For making test data more uniform, we define make_complete_tree()
to create
complete binary trees. A complete binary tree grows by filling out nodes
at every level (left to right) before running out of room and filling in nodes
at the next level (again left to right). Perfect trees have nodes at every
level. For example, a perfect tree of depth 3 (which has \(2^3 - 1 = 7\) nodes)
looks like this:
Below we create a tree and perform a random number of deletions, all at the root node.
The following test is similar, but we delete at random points in the tree.
5. Export
from __future__ import annotations from typing import Any, Callable, Optional, List code