25. Sorting Algorithms

Warning

At this stage of the course we are starting to see nontrivial algorithms. Although you may have been able to quickly understand the previous algorithms, going forward you should not expect to observe an algorithm and immediately understand how it works.

Although you are all equipped with the requisite skills to understand these algorithms, they require careful study to understand. The best strategy for approaching these algorithms is to carefully trace through their operation, step by step, making no assumptions of the code.

Similar to searching, sorting is a problem you are all familiar with
Unlike searching, we have not investigated any sorting algorithms
Believe it or not, there are many ways one could sort a list
- Some are better than others
- Some are better than others under certain conditions
- Some are just plain terrible

Activity

Have you ever sorted things in your life?
Take a moment to talk amongst yourselves about how you you sorted things.
- What was the step-by-step algorithm?

You can keep your discussion high-level, but try to explain it such that the person you are explaining it to could follow your instructions to sort things.

Note

Each of the following algorithms have many possible implementations. What makes the algorithm the algorithm is the high-level idea, not the actual, literal implementations. This idea will be emphasized as we discuss the individual sorting algorithms below.

25.1. Selection Sort

Selection sort is fairly accessible, and chances are you have sorted things in real life using this algorithm
Selection sort works by repeatedly selecting the smallest element from the collection
The high-level algorithm is as follows
- Start with the unsorted list and an empty list for the sorted elements
- For each element in the collection
  Perform a linear search on the unsorted list for the current smallest element
  
  Remove the current smallest element from the unsorted list and append it to the sorted list
To reason about how and it works, consider
- The first time the linear search finds the smallest element, that element must be the first element in the sorted collection
- Every other subsequent linear search finds the remaining smallest element and appends it to the sorted collection
  It cannot come before anything in the sorted collection since we know it must be greater than (or equal to) all elements in the sorted collection

Activity

Perform a selection sort, with pencil and paper, on the list [3,7,4,1,5,2]. Keep track of both the unsorted list and sorted list at each step of the algorithm.

Now that we know a little bit about how to analyze algorithms, let’s figure out how much work this algorithm needs in order to solve the problem
- Selection sort requires a linear search to find the current smallest element in the collection
  We know that linear search takes \(n\) amount of work for a list of size \(n\)
- But, selection sort needs to perform a linear search for each element in the collection
  If our unsorted list is of size \(n\), that means the linear search must be run \(n\) times
- If we put it together, we need to run linear search, which takes \(n\) amount of work, a total of \(n\) times
  We need to do \(n\) work \(n\) times
  
  This would be a total of \(n^{2}\) work for an unsorted list of size \(n\)

def selection_sort(collection):
    sorted_collection = []
    for _ in range(len(collection)):
        current_smallest = collection[0]
        for element in collection:
            if element < current_smallest:
                current_smallest = element
        collection.remove(current_smallest)
        sorted_collection.append(current_smallest)
    return sorted_collection

25.2. Insertion Sort

Insertion sort works by repeatedly inserting elements into their proper location within a collection
The high-level algorithm is as follows
- Start with the unsorted list and an empty list for the sorted elements
- For each element in the collection
  Remove the element from the unsorted list
  
  Perform a linear search on the sorted list to find the index where the new element should be inserted
  
  Insert the new element into the sorted list at the index where it belongs
To reason about how and it works, consider
- Elements are inserted into their proper relative location to the elements already within the sorted collection
- Any subsequent insertion cannot disrupt the ordering of the whole list if it is being inserted in its proper location
  For example, consider the list [1, 4, 7, 9]
  
  If we need to insert the number 6, it obviously goes between the 4 and 7 since 4 < 6 < 7 — [1, 4, 6, 7, 9]
  
  The order of the elements 1 and 4 remain unchanged, and similarly with 7 and 9

Activity

Perform a insertion sort, with pencil and paper, on the list [3,7,4,1,5,2]. Keep track of both the unsorted list and sorted list at each step of the algorithm.

The analysis of the algorithm is very similar to that of selection sort
- We need to perform a linear search for each of the \(n\) elements in the unsorted list
- We know linear search takes \(n\) amount of work for a list of size \(n\)
- Therefore, we need to do \(n\) work \(n\) times — a total of \(n^{2}\) work for an unsorted list of size \(n\)

def insertion_sort(collection):
    sorted_collection = []
    for element in collection:
        i = 0
        # Scan sorted collection to find insertion spot
        while i < len(sorted_collection) and sorted_collection[i] < element:
            i += 1
        sorted_collection.insert(i, element)
    return sorted_collection

25.3. Bubble Sort

Bubble sort works a little differently than selection or insertion sort
The general idea is to perform multiple scans of the unsorted list comparing each adjacent pair of elements
- If the elements are out of order, swap them, otherwise, leave them alone
- Move to the next adjacent pair of elements
- Repeat
Consider the below unsorted list
\(4, 3, 2, 1\) — Four is greater than three, so they swap

\(3, 4, 2, 1\) — Four is greater than two, so they swap

\(3, 2, 4, 1\) — Four is greater than one, so they swap

\(3, 2, 1, 4\) — There are two important things to notice at the end of the first pass
- A single pass is not enough to guarantee the list is sorted
- After the first pass, the largest element in the unsorted list will be in its correct location
  The largest element will always win the swap
  
  If that element wins the swap, it will be considered in the next comparison of adjacent elements
If we repeat this process by doing another pass

\(3, 2, 1, 4\) — Three is greater than two, so they swap

\(2, 3, 1, 4\) — Three is greater than 1, so they swap

\(2, 1, 3, 4\) — Three is less than four, so they do not swap

\(2, 1, 3, 4\)
Notice that the second largest element in the unsorted list is now in its correct location
To generalize this idea
If after the \(i^{th}\) pass the \(i^{th}\) largest element is in its correct location, how many passes do we need?
- Assuming an unsorted list of size \(n\), we need \(n\) passes to guarantee the list is sorted
- First pass has the largest in its correct location
- Second has the second largest in its correct location
- Third has the third largest in the correct location
- …
- \(n^{th}\) pass has the \(n^{th}\) largest in its correct location
The high-level algorithm is as follows
- Repeat the following n times
  Perform a scan on the unsorted list comparing each adjacent pair of elements
  
  If the elements are out of order, swap them
  
  Move to the next adjacent pair of elements
  
  Repeat

Activity

Perform a bubble sort, with pencil and paper, on the list [3,7,4,1,5,2]. Keep track of both the unsorted list and sorted list at each step of the algorithm.

The analysis of the algorithm should feel familiar at this point
- We need to do a scan of \(n\) elements a total of \(n\) times
- therefore, we need to do \(n\) work \(n\) times — a total of \(n^{2}\) work for an unsorted list of size \(n\)

def bubble_sort(collection):
    collection = collection[:]
    for j in range(len(collection)):
        for i in range(len(collection) - 1 - j):
            if collection[i] > collection[i + 1]:
                collection[i], collection[i + 1] = collection[i + 1], collection[i]
    return collection

We could improve the algorithm slightly
Consider being asked to sort an already sorted list with bubble sort
It would be rather silly doing \(n\) passes on the list to sort it if we know it’s already sorted
Instead, we can repeatedly do passes on the list until we complete a full scan without any swaps
- If there was no swaps, it means nothing was out of order, which means the list is sorted

def bubble_sort_improved(collection):
    collection = collection[:]
    has_swapped = True
    complete_cells = 0
    while has_swapped:
        has_swapped = False
        for i in range(len(collection) - 1 - complete_cells):
            if collection[i] > collection[i + 1]:
                collection[i], collection[i + 1] = collection[i + 1], collection[i]
                has_swapped = True
        complete_cells += 1
    return collection

25. Sorting Algorithms

25.1. Selection Sort

25.2. Insertion Sort

25.3. Bubble Sort

25.4. For Next Class