25. Sorting Algorithms
Warning
At this stage of the course we are starting to see nontrivial algorithms. Although you may have been able to quickly understand the previous algorithms, going forward you should not expect to observe an algorithm and immediately understand how it works.
Although you are all equipped with the requisite skills to understand these algorithms, they require careful study to understand. The best strategy for approaching these algorithms is to carefully trace through their operation, step by step, making no assumptions of the code.
Similar to searching, sorting is a problem you are all familiar with
Unlike searching, we have not investigated any sorting algorithms
Believe it or not, there are many ways one could sort a list
Some are better than others
Some are better than others under certain conditions
Some are just plain terrible
Activity
Have you ever sorted things in your life?
Take a moment to talk amongst yourselves about how you you sorted things.
What was the step-by-step algorithm?
You can keep your discussion high-level, but try to explain it such that the person you are explaining it to could follow your instructions to sort things.
Note
Each of the following algorithms have many possible implementations. What makes the algorithm the algorithm is the high-level idea, not the actual, literal implementations. This idea will be emphasized as we discuss the individual sorting algorithms below.
25.1. Selection Sort
Selection sort is fairly accessible, and chances are you have sorted things in real life using this algorithm
Selection sort works by repeatedly selecting the smallest element from the collection
The high-level algorithm is as follows
Start with the unsorted list and an empty list for the sorted elements
For each element in the collection
Perform a linear search on the unsorted list for the current smallest element
Remove the current smallest element from the unsorted list and append it to the sorted list
To reason about how and it works, consider
The first time the linear search finds the smallest element, that element must be the first element in the sorted collection
Every other subsequent linear search finds the remaining smallest element and appends it to the sorted collection
It cannot come before anything in the sorted collection since we know it must be greater than (or equal to) all elements in the sorted collection
Activity
Perform a selection sort, with pencil and paper, on the list [3,7,4,1,5,2]
. Keep track of both the unsorted list
and sorted list at each step of the algorithm.
Now that we know a little bit about how to analyze algorithms, let’s figure out how much work this algorithm needs in order to solve the problem
Selection sort requires a linear search to find the current smallest element in the collection
We know that linear search takes \(n\) amount of work for a list of size \(n\)
But, selection sort needs to perform a linear search for each element in the collection
If our unsorted list is of size \(n\), that means the linear search must be run \(n\) times
If we put it together, we need to run linear search, which takes \(n\) amount of work, a total of \(n\) times
We need to do \(n\) work \(n\) times
This would be a total of \(n^{2}\) work for an unsorted list of size \(n\)
1def selection_sort(collection):
2 sorted_collection = []
3 for _ in range(len(collection)):
4 current_smallest = collection[0]
5 for element in collection:
6 if element < current_smallest:
7 current_smallest = element
8 collection.remove(current_smallest)
9 sorted_collection.append(current_smallest)
10 return sorted_collection
25.2. Insertion Sort
Insertion sort works by repeatedly inserting elements into their proper location within a collection
The high-level algorithm is as follows
Start with the unsorted list and an empty list for the sorted elements
For each element in the collection
Remove the element from the unsorted list
Perform a linear search on the sorted list to find the index where the new element should be inserted
Insert the new element into the sorted list at the index where it belongs
To reason about how and it works, consider
Elements are inserted into their proper relative location to the elements already within the sorted collection
Any subsequent insertion cannot disrupt the ordering of the whole list if it is being inserted in its proper location
For example, consider the list
[1, 4, 7, 9]
If we need to insert the number
6
, it obviously goes between the4
and7
since4 < 6 < 7
—[1, 4, 6, 7, 9]
The order of the elements
1
and4
remain unchanged, and similarly with7
and9
Activity
Perform a insertion sort, with pencil and paper, on the list [3,7,4,1,5,2]
. Keep track of both the unsorted list
and sorted list at each step of the algorithm.
The analysis of the algorithm is very similar to that of selection sort
We need to perform a linear search for each of the \(n\) elements in the unsorted list
We know linear search takes \(n\) amount of work for a list of size \(n\)
Therefore, we need to do \(n\) work \(n\) times — a total of \(n^{2}\) work for an unsorted list of size \(n\)
1def insertion_sort(collection):
2 sorted_collection = []
3 for element in collection:
4 i = 0
5 # Scan sorted collection to find insertion spot
6 while i < len(sorted_collection) and sorted_collection[i] < element:
7 i += 1
8 sorted_collection.insert(i, element)
9 return sorted_collection
25.3. Bubble Sort
Bubble sort works a little differently than selection or insertion sort
The general idea is to perform multiple scans of the unsorted list comparing each adjacent pair of elements
If the elements are out of order, swap them, otherwise, leave them alone
Move to the next adjacent pair of elements
Repeat
Consider the below unsorted list
\(4, 3, 2, 1\) — Four is greater than three, so they swap
\(3, 4, 2, 1\) — Four is greater than two, so they swap
\(3, 2, 4, 1\) — Four is greater than one, so they swap
\(3, 2, 1, 4\) — There are two important things to notice at the end of the first pass
A single pass is not enough to guarantee the list is sorted
After the first pass, the largest element in the unsorted list will be in its correct location
The largest element will always win the swap
If that element wins the swap, it will be considered in the next comparison of adjacent elements
If we repeat this process by doing another pass
\(3, 2, 1, 4\) — Three is greater than two, so they swap
\(2, 3, 1, 4\) — Three is greater than 1, so they swap
\(2, 1, 3, 4\) — Three is less than four, so they do not swap
\(2, 1, 3, 4\)
Notice that the second largest element in the unsorted list is now in its correct location
To generalize this idea
If after the \(i^{th}\) pass the \(i^{th}\) largest element is in its correct location, how many passes do we need?
Assuming an unsorted list of size \(n\), we need \(n\) passes to guarantee the list is sorted
First pass has the largest in its correct location
Second has the second largest in its correct location
Third has the third largest in the correct location
…
\(n^{th}\) pass has the \(n^{th}\) largest in its correct location
The high-level algorithm is as follows
Repeat the following n times
Perform a scan on the unsorted list comparing each adjacent pair of elements
If the elements are out of order, swap them
Move to the next adjacent pair of elements
Repeat
Activity
Perform a bubble sort, with pencil and paper, on the list [3,7,4,1,5,2]
. Keep track of both the unsorted list
and sorted list at each step of the algorithm.
The analysis of the algorithm should feel familiar at this point
We need to do a scan of \(n\) elements a total of \(n\) times
therefore, we need to do \(n\) work \(n\) times — a total of \(n^{2}\) work for an unsorted list of size \(n\)
1def bubble_sort(collection):
2 collection = collection[:]
3 for j in range(len(collection)):
4 for i in range(len(collection) - 1 - j):
5 if collection[i] > collection[i + 1]:
6 collection[i], collection[i + 1] = collection[i + 1], collection[i]
7 return collection
We could improve the algorithm slightly
Consider being asked to sort an already sorted list with bubble sort
It would be rather silly doing \(n\) passes on the list to sort it if we know it’s already sorted
Instead, we can repeatedly do passes on the list until we complete a full scan without any swaps
If there was no swaps, it means nothing was out of order, which means the list is sorted
1def bubble_sort_improved(collection):
2 collection = collection[:]
3 has_swapped = True
4 complete_cells = 0
5 while has_swapped:
6 has_swapped = False
7 for i in range(len(collection) - 1 - complete_cells):
8 if collection[i] > collection[i + 1]:
9 collection[i], collection[i + 1] = collection[i + 1], collection[i]
10 has_swapped = True
11 complete_cells += 1
12 return collection