26. Binary Search Trees
With binary trees covered, a more specific type of binary trees can be discussed
Namely, the binary search tree
Compared to a binary tree, binary search trees have specific properties and constraints that must be followed
These properties and constraints can be taken advantage of for improvements in certain functionality
26.1. Binary Search Tree Definition
A binary search tree is a binary tree where:
All values in the left subtree are less than the value in the root node
All values in the right subtree are greater than or equal to the value in the root node
Note
One could make the left subtree less than or equal to and the right subtree strictly greater than. All that matters is that only on subtree includes the equal to part.
Alternatively, one could make the right subtree exclusively greater than, but this would disallow duplicates.
In the above example, note that the leftmost node is the smallest value in the tree and the rightmost is the largest
26.1.1. A Binary Search Tree is a Binary Tree
A binary search tree is a special case of a binary tree
Therefore, it will have all the operations a binary tree would have
But a few additional operations will be added
Add, but based on the important ordering
Remove, but must preserve ordering
Remove max
Remove min
26.2. Searching a Binary Search Tree
The special ordering property enables a more efficient search through the tree when compared to a search regular binary tree
For this section, assume the use of a linked implementation of a binary search tree
More on this below
26.2.1. Naive Search
Since the binary search tree is a binary tree, the search discussed in the previous topic can be used
This search strategy has a computational complexity of \(O(n)\), where \(n\) is the number of nodes in the tree
1public boolean contains(T needle) {
2 return contains(root, needle);
3}
4
5private boolean contains(Node<T> current, T needle) {
6 if (current == null) {
7 return false;
8 } else if (Objects.equals(current.getData(), needle)) {
9 return true;
10 } else {
11 return contains(current.getLeft(), needle) || contains(current.getRight(), needle);
12 }
13}
26.2.2. Binary Search
Consider the above binary search tree
Based on the special ordering binary search trees have, can an improved search be devised?
Hint: If looking for a number greater than 14, which subtree must the element be in if it exists in the tree?
241 @Override
242 public boolean contains(T element) {
243 return binarySearch(element, root) != null;
244 }
245
246 /**
247 * Helper method enabling a recursive binary search for a given element.
248 *
249 * @param element Element being searched for
250 * @param current Current node being investigated
251 * @return Node containing the element searched for, or null if not found
252 */
253 private Node<T> binarySearch(T element, Node<T> current) {
254 if (current == null) {
255 return null;
256 }
257 int comparison = current.getData().compareTo(element);
258 if (comparison == 0) {
259 return current;
260 } else {
261 if (comparison > 0) {
262 return binarySearch(element, current.getLeft());
263 } else {
264 return binarySearch(element, current.getRight());
265 }
266 }
267 }
Notice the similarity between the naive search and the binary search
They are nearly the same, except that the naive search may search both subtrees
However, the binary search will only ever travel down one subtree due to the
if
statements on the value ofcomparison
In the above example, if searching for the number 12, it can be concluded that it exist in the tree since
12 is less than 14, so it must be in the left subtree of the node containing 14
12 is greater than 8, thus it must be in the right subtree
12 is equal to 12, therefore we conclude that we found it
If, however, searching for the number 13, it can be concluded that it must not exist in the tree since
13 is less than 14, so it must be in the left subtree of the node containing 14
13 is greater than 8, thus it must be in the right subtree
13 is greater than 12, therefore it must be in the right subtree
12 has no right subtree, therefore 13 must not be in the tree
In the above example, if searching for the number 22, it can be concluded that it must not be in the tree since
22 is greater than 14, so go right
22 is less than 26, go left
22 is greater than 19, so go right
22 is less than 23, so go left
23 has no left subtree, therefore 22 must not be in the tree
26.3. Binary Search Tree Operations
Getting the size of the collection, checking if it’s is empty, and getting iterators should be relatively simple by this stage
The adding to and removing from the binary search tree is where the complexity comes in
26.3.1. Add
Adding is going to be similar to a binary search
Do a binary search until there is no child
Insert the element where the child does not exist
New nodes will be added as a leaf
1Define add
2 If the node does not exist
3 Insert the new node here with the element to be added
4
5 If the node exists
6 If the element is less than the node
7 Call add on the left child node
8
9 If the element is greater than or equal to the node
10 Call add on the right child node
Starting an existing binary search tree, follow the pseudocode to add the value 13
Starting with an empty tree, follow the pseudocode to add the following 6 elements
26.3.2. Min & Remove Min
Given the nature of the binary search tree ordering, the node with the minimum element must be in the leftmost node in the tree
There are three cases to consider when looking for the minimum node:
The leftmost node is the root (root has no left child)
The leftmost is a leaf node (no children)
The leftmost node is an interior node
Finding the minimum is simple
Assuming a root node exists, keep going to the left subtree until there is no more left subtree
Removing will be a little trickier since it may be removing something that has children that must remain in the tree
And further, not only must the children stay in the tree, but the proper binary search tree ordering must be preserved
Consider the following examples and think of how one would need to manage potential children of a node being removed
Case 1
The minimum value is in the root node, this means that
There is no left subtree
There may be a right subtree
Therefore, simply replace the root with the root of the right subtree (which may be null)
The ordering is preserved since
Nothing is to the left of the root
Everything to the right of the root is larger than the value in the root
The right subtree is a binary search tree (by definition)
Therefore, if the right subtree becomes the new root, the binary search tree’s order will be maintained
Case 2
The minimum is a leaf node, this means that
There are no children to deal with
Therefore, just remove the node
By eliminating only a leaf node, the ordering will not be affected
Case 3
The minimum is an interior node, this means that
The node has no left subtree — otherwise the node would not contain the minimum value since the minimum must be the leftmost node
A right subtree exists
Therefore, replace it with the node’s right subtree’s root
The node being removed’s parent’s left child will become the node being removed’s right child
The ordering is preserved in the same way as case 1
Nothing is to the left of the node
Everything to the right of the node to be removed is larger than the value in the node
The right subtree is a binary search tree (by definition)
Therefore, if the right subtree replaces the node being removed, the binary search tree’s order will be maintained
All the Cases are the Same?
One may have noticed that the rules for each case are actually the same
Replace the node with the right subtree’s root
This is perhaps more obvious for cases 1 and 3
Consider that a leaf node’s right subtree is
null
26.3.3. Max & Remove Max
Finding and/or removing the maximum value will be very similar to finding and/or removing the minimum
Instead of always going to the left subtree, go to the right to find the maximum
There are still the three cases to deal with, but like with minimum, they can all be addressed in the same way
Replace the node with the left subtree’s root
26.3.4. General Remove
A general remove is a little more complex than the remove min or max
In the above example, there may not be an immediately obvious or clear way to address the problem
Case 1
If the node being removed is a leaf node, then simply remove it
In the above example, if 7 or 34 is to be removed, just remove it
Case 2
If the node being removed is an interior node, then things get hairy
In some cases this may seem simple — if 15 is removed, just replace it with it’s child
But what happens if removing 26 from the tree?
The trick here is to replace the node with its inorder predecessor or successor
If 26 is removed
15, the value in the tree that comes right before 26, could replace it
Or 31, the value in the tree that comes right after 26, could replace it
The reason this will work is, if looking for the predecessor
All values in the left subtree are smaller than the root and everything in the right subtree
The largest value in the left subtree will be greater than or equal to all other values in the left subtree
The largest value in the left subtree will be less than all values in the right subtree
Therefore, the largest value in the left subtree can replace 26 without destroying the ordering since
It’s greater than or equal to everything in the left subtree
And smaller than everything in the right subtree
Below is an example of removing multiple elements from a binary search tree
26.3.5. Contains
Like the other data structures, a way to check if a given element exists within the collection is needed
Although an exhaustive depth first search through the tree would work, as discussed for the general binary tree would work
\(O(n)\)
Here, due to the nature of the binary search tree ordering, a binary search can be used
\(O(log_{2}(n))\)
26.3.6. Counting the Number of a Given Element
Similar to contains, to count the number of times a given element exists within a binary search tree, a binary search can be used
However, instead of returning
True
orFalse
as soon as the element is found, continue the search after finding an element to continue counting
26.4. Degenerate vs. Balanced
A balanced tree has the property that for any node in the tree, the height of its left and right subtrees can differ by at most 1
Remember, the height of an empty tree is -1
The balanced property is important since it’s part of the reason the binary search tree is efficient to search
Consider adding the numbers
3, 5, 9, 12, 18, 20
in that order to an empty binary search tree
This particular case is called a degenerate binary tree
It’s effectively a linear data structure, not a tree
Consider a binary search
With a balanced binary search tree, going to the left or right subtrees eliminates roughly half the elements
This is what gives the \(O(log(n))\) search
With a degenerate tree, where it’s basically a linear data structure, all \(n\) elements must be searched through
Only one element is eliminated from the search every time a node is investigated
Thus, the search through a degenerate tree would be \(O(n)\)
Fortunately there exist strategies for keeping binary search trees balanced, but these are outside the scope of this course
Two popular examples are AVL Trees and Red-Black Trees
26.5. For next time
Have a look at the
BinarySearchTree
interfaceHave a look at the
LinkedBinarySearchTree
implementationCheck out
LinkedBinarySearchTreeTest
26.6. For Next Time
Read Chapter 11 Sections 1 – 3
17 pages
26.6.1. Playing Code
Download and play with