25. Sorting Algorithms

Warning

At this stage of the course we are starting to see nontrivial algorithms. Although you may have been able to quickly understand the previous algorithms, going forward you should not expect to observe an algorithm and immediately understand how it works.

Although you are all equipped with the requisite skills to understand these algorithms, they require careful study to understand. The best strategy for approaching these algorithms is to carefully trace through their operation, step by step, making no assumptions of the code.

  • Similar to searching, sorting is a problem you are all familiar with

  • Unlike searching, we have not investigated any sorting algorithms

  • Believe it or not, there are many ways one could sort a list

    • Some are better than others

    • Some are better than others under certain conditions

    • Some are just plain terrible

Activity

  1. Have you ever sorted things in your life?

  2. Take a moment to talk amongst yourselves about how you you sorted things.

    • What was the step-by-step algorithm?

You can keep your discussion high-level, but try to explain it such that the person you are explaining it to could follow your instructions to sort things.

Note

Each of the following algorithms have many possible implementations. What makes the algorithm the algorithm is the high-level idea, not the actual, literal implementations. This idea will be emphasized as we discuss the individual sorting algorithms below.

25.1. Selection Sort

../../_images/selection_sort.gif
  • Selection sort is fairly accessible, and chances are you have sorted things in real life using this algorithm

  • Selection sort works by repeatedly selecting the smallest element from the collection

  • The high-level algorithm is as follows

    • Start with the unsorted list and an empty list for the sorted elements

    • For each element in the collection

      • Perform a linear search on the unsorted list for the current smallest element

      • Remove the current smallest element from the unsorted list and append it to the sorted list

  • To reason about how and it works, consider

    • The first time the linear search finds the smallest element, that element must be the first element in the sorted collection

    • Every other subsequent linear search finds the remaining smallest element and appends it to the sorted collection

      • It cannot come before anything in the sorted collection since we know it must be greater than (or equal to) all elements in the sorted collection

Activity

Perform a selection sort, with pencil and paper, on the list [3,7,4,1,5,2]. Keep track of both the unsorted list and sorted list at each step of the algorithm.

  • Now that we know a little bit about how to analyze algorithms, let’s figure out how much work this algorithm needs in order to solve the problem

    • Selection sort requires a linear search to find the current smallest element in the collection

      • We know that linear search takes \(n\) amount of work for a list of size \(n\)

    • But, selection sort needs to perform a linear search for each element in the collection

      • If our unsorted list is of size \(n\), that means the linear search must be run \(n\) times

    • If we put it together, we need to run linear search, which takes \(n\) amount of work, a total of \(n\) times

      • We need to do \(n\) work \(n\) times

      • This would be a total of \(n^{2}\) work for an unsorted list of size \(n\)

 1def selection_sort(collection):
 2    sorted_collection = []
 3    for _ in range(len(collection)):
 4        current_smallest = collection[0]
 5        for element in collection:
 6            if element < current_smallest:
 7                current_smallest = element
 8        collection.remove(current_smallest)
 9        sorted_collection.append(current_smallest)
10    return sorted_collection

25.2. Insertion Sort

../../_images/insertion_sort.gif
  • Insertion sort works by repeatedly inserting elements into their proper location within a collection

  • The high-level algorithm is as follows

    • Start with the unsorted list and an empty list for the sorted elements

    • For each element in the collection

      • Remove the element from the unsorted list

      • Perform a linear search on the sorted list to find the index where the new element should be inserted

      • Insert the new element into the sorted list at the index where it belongs

  • To reason about how and it works, consider

    • Elements are inserted into their proper relative location to the elements already within the sorted collection

    • Any subsequent insertion cannot disrupt the ordering of the whole list if it is being inserted in its proper location

      • For example, consider the list [1, 4, 7, 9]

      • If we need to insert the number 6, it obviously goes between the 4 and 7 since 4 < 6 < 7[1, 4, 6, 7, 9]

      • The order of the elements 1 and 4 remain unchanged, and similarly with 7 and 9

Activity

Perform a insertion sort, with pencil and paper, on the list [3,7,4,1,5,2]. Keep track of both the unsorted list and sorted list at each step of the algorithm.

  • The analysis of the algorithm is very similar to that of selection sort

    • We need to perform a linear search for each of the \(n\) elements in the unsorted list

    • We know linear search takes \(n\) amount of work for a list of size \(n\)

    • Therefore, we need to do \(n\) work \(n\) times — a total of \(n^{2}\) work for an unsorted list of size \(n\)

1def insertion_sort(collection):
2    sorted_collection = []
3    for element in collection:
4        i = 0
5        # Scan sorted collection to find insertion spot
6        while i < len(sorted_collection) and sorted_collection[i] < element:
7            i += 1
8        sorted_collection.insert(i, element)
9    return sorted_collection

25.3. Bubble Sort

../../_images/bubble_sort.gif
  • Bubble sort works a little differently than selection or insertion sort

  • The general idea is to perform multiple scans of the unsorted list comparing each adjacent pair of elements

    • If the elements are out of order, swap them, otherwise, leave them alone

    • Move to the next adjacent pair of elements

    • Repeat

  • Consider the below unsorted list

    \(4, 3, 2, 1\) — Four is greater than three, so they swap

    \(3, 4, 2, 1\) — Four is greater than two, so they swap

    \(3, 2, 4, 1\) — Four is greater than one, so they swap

    \(3, 2, 1, 4\) — There are two important things to notice at the end of the first pass

    • A single pass is not enough to guarantee the list is sorted

    • After the first pass, the largest element in the unsorted list will be in its correct location

      • The largest element will always win the swap

      • If that element wins the swap, it will be considered in the next comparison of adjacent elements

  • If we repeat this process by doing another pass

    \(3, 2, 1, 4\) — Three is greater than two, so they swap

    \(2, 3, 1, 4\) — Three is greater than 1, so they swap

    \(2, 1, 3, 4\) — Three is less than four, so they do not swap

    \(2, 1, 3, 4\)

  • Notice that the second largest element in the unsorted list is now in its correct location

  • To generalize this idea

  • If after the \(i^{th}\) pass the \(i^{th}\) largest element is in its correct location, how many passes do we need?

    • Assuming an unsorted list of size \(n\), we need \(n\) passes to guarantee the list is sorted

    • First pass has the largest in its correct location

    • Second has the second largest in its correct location

    • Third has the third largest in the correct location

    • \(n^{th}\) pass has the \(n^{th}\) largest in its correct location

  • The high-level algorithm is as follows

    • Repeat the following n times

      • Perform a scan on the unsorted list comparing each adjacent pair of elements

        • If the elements are out of order, swap them

        • Move to the next adjacent pair of elements

        • Repeat

Activity

Perform a bubble sort, with pencil and paper, on the list [3,7,4,1,5,2]. Keep track of both the unsorted list and sorted list at each step of the algorithm.

  • The analysis of the algorithm should feel familiar at this point

    • We need to do a scan of \(n\) elements a total of \(n\) times

    • therefore, we need to do \(n\) work \(n\) times — a total of \(n^{2}\) work for an unsorted list of size \(n\)

1def bubble_sort(collection):
2    collection = collection[:]
3    for j in range(len(collection)):
4        for i in range(len(collection) - 1 - j):
5            if collection[i] > collection[i + 1]:
6                collection[i], collection[i + 1] = collection[i + 1], collection[i]
7    return collection
  • We could improve the algorithm slightly

  • Consider being asked to sort an already sorted list with bubble sort

  • It would be rather silly doing \(n\) passes on the list to sort it if we know it’s already sorted

  • Instead, we can repeatedly do passes on the list until we complete a full scan without any swaps

    • If there was no swaps, it means nothing was out of order, which means the list is sorted

 1def bubble_sort_improved(collection):
 2    collection = collection[:]
 3    has_swapped = True
 4    complete_cells = 0
 5    while has_swapped:
 6        has_swapped = False
 7        for i in range(len(collection) - 1 - complete_cells):
 8            if collection[i] > collection[i + 1]:
 9                collection[i], collection[i + 1] = collection[i + 1], collection[i]
10                has_swapped = True
11        complete_cells += 1
12    return collection

25.4. For Next Class