23. Searching
- For the purposes of this topic, searching is the process of looking for a specified thing in some collection to know if it exists within the collection 
- For example, does a given number exist within an array of numbers? 
- To write an algorithm to solve this problem, all that’s needed is something that says if it’s there or not - That’s all that’s required 
- Any algorithm that works would suffice 
 
- However, the focus here will be on two intuitive ideas that people commonly use in one way or another throughout their lives - Linear Search 
- Interpolation Search - A specific kind of interpolation search called a binary search will be discussed 
 
 
- Throughout this topic - The thing being searched for will be referred to as the - needle
- The collection being searched through will be referred to as the - haystack
 
23.1. Linear Search
- Start at the beginning of the - haystackand sequentially look at each element until the- needleis found- If the - needleis found, then conclude that it is there
- If the end of the - haystackis reached without finding the- needle, then conclude that it is not there
 
- Notice that this description of the algorithm doesn’t go into any more details - It is fairly high-level — abstract 
 
- The computational complexity of this algorithm is linear — \(O(n)\) - Worst case scenario, every element in the - haystackwas looked at before making any conclusion — linear \(O(n)\)
- Best case scenario, the - needleis the first element in the- haystack— constant \(O(1)\)
- On average, the element will be in the middle — linear \(O(n)\) - For every time - needleis the first element, it could be the last element in another search
- For every time the - needleis the second element, it could be in the second last position in another search
- … 
 
 
 
Example linear collection of data. With a linear search, to guarantee if some needle exists within this
haystack, each element must be looked at.
23.1.1. Iterative
- Below is a generic implementation of a linear search on an array of type - T
 8    public static <T> int iterativeLinearSearch(T needle, T[] haystack) {
 9        for (int i = 0; i < haystack.length; i++) {
10            if (Objects.equals(haystack[i], needle)) {
11                return i;
12            }
13        }
14        return NOT_FOUND;
15    }
- This iterative implementation is fairly similar to the high-level description of the algorithm - Start at - i = 0— the beginning of the- haystack
- If the element at index - iis the- needle, conclude that it’s there
- Otherwise continue and increment - i
- If the end of the - haystackis reached without finding the- needle, then conclude that it is not there
 
23.1.2. Recursive
- Below is a recursive implementation of a linear search following the same high-level abstract idea 
19    public static <T> int recursiveLinearSearch(T needle, T[] haystack, int currentIndex) {
20        if (currentIndex == haystack.length) {
21            return NOT_FOUND;
22        } else if (Objects.equals(haystack[currentIndex], needle)) {
23            return currentIndex;
24        } else {
25            return recursiveLinearSearch(needle, haystack, currentIndex + 1);
26        }
27    }
- If I wanted to call this method, I would start with - currentIndexas- 0- recursiveLinearSearch(someNeedle, someHaystack, 0)
 
- This may seem quite different from the iterative implementation, but the high-level idea is the same - Start - currentIndexat- 0
- If the end of the - haystackis reached without finding the- needle, then conclude that it is not there
- If the element at index - currentIndexis the- needle, conclude that it’s there
- Otherwise continue by calling the recursive function again with - currentIndexincremented
 
23.2. Binary Search
- Imagine looking for page 554 in a roughly 1000 page textbook 
- It would be reasonable to start by opening the textbook somewhere in the middle-ish 
- If the page landed on was page 402, which half of the book must page 554 be in? - Left set of pages, or right set of pages? 
 
- Since 554 is greater than 402, it must be in the right set of pages - This means, after looking at a single page, the 401 pages in the left set can be ignored 
 
- With this information, imagine turning to another page in the right set of pages and landing on page 621 - Which remaining set of pages must the page 554 be in? 
- In the left set of pages (403 – 620), or the right set (622 – 1000)? 
 
- This general idea is the basis of binary search - Technically, this is interpolation search 
- A binary search is a special kind of interpolation search 
 
- There is, however, a catch with binary search when compared to a linear search 
- The - haystackbeing searched must be sorted
- If it was not sorted, it would not be possible to conclude which half the - needleis in after investigating an element
- The complexity analysis of binary search may feel intimidating, but the trick is to visualize the work being done 
 
Visualization of how to search for a number within the range 1 – 15. The initial guess would be the halfway point and each subsequent guess would be the halfway point of the remaining elements. Elements at the bottom would take 4 guesses to find.
- To analyze the complexity, consider the worst case scenario - Look for an element at the bottom as it would take the most number of guesses to find 
- Looking for an element that does not exist would also work 
 
- First, imagine performing a linear search where the - needleis- 15
- In this case, all \(15\) elements in the - haystackwould need to be investigated before the- needlewas found- Since it is at the end of the - haystack
 
- When considering a binary search, how many things would need to be looked at before - 15is found?- \(4\) 
- 8->- 12->- 14->- 15
 
- The reason this takes fewer steps is because roughly half of the remaining elements are ignored after each check - For example, - 15is greater than- 8, therefore the numbers- 1–- 7can be ignored
- In other words, with linear search, after a single guess, only the element checked can be eliminated 
- But with binary search, the element checked, plus roughly half the remaining elements, can be ignored 
 
- There is a relationship between the maximum number of checks and the number of elements \(n\) in the - haystack- \(n = 2^{h + 1} - 1\), where h is the “height” of the “tree”, or, the number of steps needed to get from the top to the bottom 
- \(h = log_{2}(n + 1) - 1\) 
 
- Thus, the computational complexity of binary search is \(O(log_{2}(n))\) - As \(n\) grows, the maximum number of steps that could be taken grows like \(log_{2}(n)\) 
 
Warning
The above explanation uses details not discussed in this course yet, but they be covered in subsequent topics. In other words, don’t be too concerned if these ideas are not 100% clear yet.
23.2.1. Iterative
- Below is a generic implementation of an iterative binary search - Take note that - Tor one of their superclasses must extend- Comparable
- This is because the elements must be ordered 
 
31    public static <T extends Comparable<? super T>> int iterativeBinarySearch(T needle, T[] haystack) {
32        int lowIndex = 0;
33        int highIndex = haystack.length;
34        int midpoint = (highIndex - lowIndex) / 2;
35
36        while (lowIndex < highIndex) {
37            if (haystack[midpoint].compareTo(needle) == 0) {
38                return midpoint;
39            } else if (Objects.compare(haystack[midpoint], needle, T::compareTo) > 0) {
40                highIndex = midpoint - 1;
41                midpoint = lowIndex + (highIndex - lowIndex) / 2;
42            } else {
43                lowIndex = midpoint + 1;
44                midpoint = lowIndex + (highIndex - lowIndex) / 2;
45            }
46        }
47        return NOT_FOUND;
48    }
- Here is what’s happening - While search space has not been exausted ( - lowIndex < highIndex)- If - lowIndexis ever greater than or equal to- highIndex, there are no more indices the element could exist
 
- Look at the middle 
- If the element in the middle is the - needle- Done 
 
- If the element in the middle is less than the - needle- Continue the search on the remaining upper half by looking at the midpoint of the remaining elements 
 
- If the element in the middle is greater than the - needle- Continue the search on the remaining lower half by looking at the midpoint of the remaining elements 
 
 
23.2.2. Recursive
- Below is a recursive implementation of a binary search 
- Notice that, other than being recursive, the underlying high-level algorithm is the same as the iterative implementation 
52    public static <T extends Comparable<? super T>> int recursiveBinarySearch(T needle, T[] haystack, int lowIndex, int highIndex) {
53        if (lowIndex >= highIndex) {
54            return NOT_FOUND;
55        }
56        int midpoint = lowIndex + (highIndex - lowIndex) / 2;
57        if (haystack[midpoint].compareTo(needle) == 0) {
58            return midpoint;
59        } else if (Objects.compare(haystack[midpoint], needle, T::compareTo) > 0) {
60            return recursiveBinarySearch(needle, haystack, lowIndex, midpoint - 1);
61        } else {
62            return recursiveBinarySearch(needle, haystack, midpoint + 1, highIndex);
63        }
64    }
- To call this method to initiate a search, start with - lowIndexas- 0and- highIndexas- someHaystack.length- recursiveBinarySearch(someNeedle, someHaystack, 0, someHaystack.length)
 
23.3. For Next Time
- Read Chapter 9 Section 1 - 7 pages 
 
23.3.1. Playing Code
- Download and play with