24. Searching and Complexity
Before continuing, we should take a moment and reflect on what we have learned so far
The Python programming environment is a relatively familiar sight (Colab, IPython, PyCharm, Spyder)
You know how to
print
andinput
informationYou know about types, values, and variables
You have been making use of functions
You have learned about basic logic and know how to use
if
andelse
for
andwhile
loops are tools you have usedYou learned how to use objects
Linear data structures, like a
List
, are familiar and you are used to working with reference variablesYou know how to read and write data from/to files
You have learned how to write classes and define your own objects
You can look at Python code and have a reasonable understanding of what it is doing
You know how to write your own code, one step at a time
You know the importance of testing your code and how to write tests
The last few topics of the course have been heavy on data and data structures
At this stage of the course, we have actually equipped ourselves with the majority of tools programmers use on a daily basis
Here, we will start to focus more on algorithms and how we can use all the tools we have learned to solve more interesting problems
Further, we will take some time to think about the amount of work the algorithm does and what makes an algorithm good
24.1. Linear Search
By now we have seen linear search multiple times
Given our familiarity of the problem and the linear search algorithm, we will use it as a starting point to begin thinking a little deeper into our algorithms
1def linear_search(haystack, needle):
2 for element in haystack:
3 if element == needle:
4 return True
5 return False
Having a correct algorithm for a problem is clearly important
But we also want to make sure we have a good algorithm
However, what does it mean for an algorithm to be good?
Activity
Discuss this with your neighbours:
On average, how many iterations of the loop does
linear_search
need before it finds theneedle
?If the element you are looking for is not in the list, how many times will the loop run?
What would be the situation to cause the loop to only need to run one iteration?
How much space does your algorithm require?
Is your solution the best possible?
Might there exist some super clever algorithm that is somehow better (faster) than yours?
24.1.1. Complexity Analysis
The above activity highlights the major ideas used to discuss the goodness of an algorithm
More precisely, we analyze the Computational Complexity of an algorithm
The amount of resources the algorithm needs to run
The main focus is the time complexity and the space complexity
Time complexity is the number of basic operations the algorithm requires
Space complexity is the amount of memory the algorithm requires
To keep things simple, it is common to think about the worst case scenario for the complexity analysis
We also like to consider the amount of work required based on the size of the input
For example, the worst case scenario for
linear_search
is if theneedle
does not exist withinhaystack
If
haystack
has length of 10, the loop will run 10 times before we conclude thatneedle
is not thereIf
haystack
has length of 100, the loop will run 100 timesIf
haystack
has length of 10,000,000, the loop will run 10,000,000 timesIf
haystack
has length of \(n\), the loop will run \(n\) times
As for space complexity, this
linear_search
only requires space for storinghaystack
Assuming
haystack
has length \(n\), then we require \(n\) amount of space
24.2. Binary Search
Activity
I will guess a number between \(0\) – \(1023\) in \(10\) or less guesses.
There is a catch however — you have to tell me if the number is higher or lower than my guess before I guess again.
This is an example of a binary search
Like linear search, binary search is used to find an element within some collection
However, if I was doing a linear search for a number between \(0\) – \(1023\), I could only guarantee that I would find the element in \(1024\) guesses
But with binary search, I was able to do it in \(10\) or less guesses
Though, this required the higher/lower information — the data was sorted
A linear search has no such requirement
24.2.1. Complexity Analysis
The magic with binary search is that, with every guess I made, I was able to eliminate half of the remaining numbers
My first guess was \(512\) — if you said lower I know the number is between \(0\) – \(511\), if you said higher I know it’s between \(513\) – \(1023\)
To generalize the idea, if I had \(n\) numbers, and I guess the number \(\frac{n}{2}\)
If you say lower, then the number must be between \(0\) – \((\frac{n}{2} - 1)\)
If you said higher, then the number must be between \((\frac{n}{2} + 1)\) – \((n - 1)\)
With linear search, every guess only eliminated one number
Once again, let’s consider the worse case scenario — the
needle
is not within thehaystack
of size \(1024\)One guess gets me to \(512\) numbers
Two guesses gets me to \(256\) numbers
Three guesses get me to \(128\)
Four gets me to \(64\)
Five gets me to \(32\)
Six gets me \(16\)
Seven \(8\)
Eight \(4\)
Nine \(2\)
Ten \(1\)
Originally, with linear search, the relationship between the input \(n\) and the amount of work is \(n\)
If there are \(n\) things in the
haystack
, I have to look at all \(n\)If we doubled the size to \(2n\), the amount of things needed to be looked at also doubles to \(2n\)
With binary search however, the relationship between the size of the input \(n\) and the amount of work is \(log_{2}(n)\)
Doubling the size to \(2n\) only adds one more guess
Given that binary search requires \(log{2}(n)\) basic operations vs. linear search’s \(n\), binary search is the clear winner
But, there is no such thing as a free lunch
With binary search, we have the catch that the data must be sorted
This is a very common pattern in developing algorithms
The more general your algorithm is, the worse the solution
The more you know about the structure of your problem, the more opportunities you have to use that knowledge to improve your algorithm
24.3. Linear Search in Other Programming Languages
At this stage we have been programming exclusively in Python
However, there are many other programming languages
Learning a new programming language may feel intimidating, but you may be surprised at how similar many of them are
First, the underlying algorithms are the same, regardless of the language — a linear search is a linear search
Second, even the syntax between many languages are remarkably similar
Below is a collection of linear search algorithms in various popular programming languages
The purpose of their inclusion here is to get a sense of how similar and dissimilar programing languages can be
Despite never learning the various languages, chances are you can still understand much of the code completely
24.3.1. Python
1def linear_search(haystack: list[int], needle: int) -> bool:
2 for i in range(len(haystack)):
3 if haystack[i] == needle:
4 return True
5 return False
24.3.2. Java
1public static boolean linearSearch(int[] haystack, int needle){
2 for(int i = 0 ; i < haystack.length ; i++){
3 if(haystack[i] == needle){
4 return true;
5 }
6 }
7 return false;
8}
24.3.3. C#
1public static boolean linearSearch(int[] haystack, int needle){
2 for(int i = 0 ; i < haystack.length ; i++){
3 if(haystack[i] == needle){
4 return true;
5 }
6 }
7 return false;
8}
24.3.4. C++
1bool linear_search(std::vector<int> haystack, int needle){
2 for(int i = 0 ; i < haystack.size() ; i++){
3 if(haystack[i] == needle){
4 return true;
5 }
6 }
7 return false;
8}
24.3.5. C
1bool linear_search(int haystack[], int n, int needle){
2 for(int i = 0 ; i < n ; i++){
3 if(haystack[i] == needle){
4 return true;
5 }
6 }
7 return false;
8}
24.3.6. Haskell
Below you will see a linear search that looks quite different from the previous
Haskell is an entirely different kind of programming language — it is a functional programming language
It is, for better or worse, not nearly as popular as the languages seen in the above examples
1linear_search :: Eq a => [a] -> a -> Bool
2linear_search [] _ = False
3linear_search (x:xs) y = x==y || linear_search xs y