Binary Search Trees
Binary search trees are an important data structure for maintaining a map.
Maps
A map data structure stores some number of key/value pairs. Given a key, the map can look up the value associated with the key. There are also operations to insert a new key/value pair in the map and to remove an existing key/value pair.
Here is one possible way to specify a map as a Java interface:
This interface is similar to the built-in java.util.Map interface.
The add method adds a new key/value pair to the map, returning true if the pair was added or false if the key is already present in the map. The find method looks up the value associated with a given key, returning null if no matching key exists. The remove method removes the key/value pair whose key is given, returning true if sucessful or false if no such key exists.
The expectation with a map data structure is that add, find, and remove can all be implemented efficiently. A trivial map implementation, such as an unsorted linked list of key/value pairs, would require O(n) steps for all three operations. We will see that binary search trees are capable of performing all of these operations in O(log n) time, if we are careful to avoid allowing the tree to grow into pathological configurations.
Binary Search Trees
A binary search tree is a tree in which each node stores a key/value pair. The keys are ordered, meaning that for any pair of keys a and b, it is possible to determine whether a<b, a>b, or a==b. Each node obeys the binary search tree property:
In a binary search tree, the left subtree contains nodes whose keys are less than the root’s key, and the right subtree contains keys that are greater than the root’s key.
This property applies recursively: not just at the root of the overall binary search tree, but also in every subtree:
This property leads to an extremely simple algorithm for the find method:
Starting out at the root of the overall tree, we compare the key we’re searching for the key in the current node. If equal, then we’re done and we return the value in the node. Otherwise, we continue the search in the left or right subtree depending on whether the search key is less than or greater than the current node’s key, respectively.
Insertion
Insertion is also a relatively simple operation. In a process very similar to find, we traverse through the tree until we find a parent node where we can attach a new node containing the key/value pair being inserted.
Deletion
Deleting a key/value pair from the tree is more complicated than inserting. If the deleted key is in a leaf node of the tree, then there is no problem; we can just delete the node containing the key.
Deleting a node with a single child is also easy: we can just delete the node and “pull up” the child to replace its deleted parent:
However, if the key is in a node that has two children, we need to make sure that the children (and all of their descendents) remain in the tree. This is difficult, since there are two children to replace the single deleted node:
We can implement deletion by picking a “victim” node that is easy to delete, and then moving its contents (key and value) to the node containing the key/value pair that will actually be deleted. The only problem is that we need to be able to find a node that is (1) easy to remove from the tree, and (2) contains a key that is valid to put in the node whose key is being removed. There are two such nodes:
- The node with the maximum key in the left subtree.
- The node with the minimum key in the right subtree.
Let’s use the node with the minimum key in the right subtree. It is easy to see how it fits our criteria. First, its left subtree must be empty. (If it weren’t empty, then there would be a node with a lesser key.) Second, its key is less than all other keys in the right subtree. Its key is also greater than all keys in the left subtree. Thus, it is legal to copy to the root of these subtrees (which is the node containing the key we want to delete.)
Performance
A complete binary tree is one in which
- Every level of the tree except the bottom level contains the maximum number of nodes: 2l where l is the level of the tree (counting the root as level 0), and
- All of the leaves in the bottom level of the tree are as far to the left as possible
Example:
It is easy to prove that a complete binary tree has height O(log n), where n is the number of nodes in the tree. If a binary search tree is a complete binary tree, or approximates one closely enough, then any binary search tree operation will complete in O(log n) steps. This is because in the worst case, each search tree operation (insert, find, and remove) takes a number of steps proportional to the distance along a path from root to leaf: in other words, the height of the tree.
Unfortunately, it is easy to construct a binary search tree that is very far from being complete. In particular, consider what happens if we insert a sequence of keys that are already in sorted order, such as the sequence 5, 10, 15, 20, 25:
Now the tree is essentially a linked list, with O(n) worst-case running time for insert, delete, and find. The problem is that the search tree is not properly balanced. In a balanced search tree, the heights of the left and right subtrees are roughly equal everywhere in the tree.