11. Selection and Population Management

  • One of the main driving forces of evolutionary computation algorithms is the variation operators

  • Another major driving force is the selection strategy

    • This determines an individual’s ability to survive and have offspring based on their fitness values

  • Selection is based on the relative fitness values of the individual

  • It works independently from the actual problem and representation

11.1. Generational vs. Steady State

  • All the algorithms provided so far have been implemented as a generational algorithm

    • A whole new population is created based on the current population

    • The new population becomes the current population

    • The population size remains the same

    • There are very discrete transitions between generations

  • With the generational algorithm, depending on how the genetic operators are applied, the new population is filled with

    • Copies of unchanged selected chromosomes

    • Mutated selected chromosomes

    • Offspring resulting from crossover on selected chromosomes

  • On the other hand, steady state algorithms provide a more continuous type of evolution

    • Not as clear discrete transitions between generations

  • With steady state

    • Given a single population of size \(\mu\)

    • Select some number of chromosomes \(\lambda\) and apply the variation operators to produce the offspring

    • Replace \(\lambda\) chromosomes in the population with the offspring

  • The generational gap is the proportion of the the population that was replaced

    • \(\frac{\lambda}{\mu}\)

  • If the number of chromosomes selected is equal to the population size, it is equivalent to a generational algorithm

    • If \(\lambda = \mu\)

    • When the generational gap is \(\frac{\lambda}{\mu} = 1\)

  • With a generational algorithm, the whole population is replaced

  • But with a generational gap less than one, there needs to be another selection mechanism for replacement

    • Determining which candidate solutions of the population are to be replaced with the \(\lambda\) offspring

11.2. Tournament Selection

  • Two very basic selections could be implemented

  1. Uniform selection

    • Each chromosome has an equal chance at being selected

      • \(p(i) = \frac{1}{\mu}\)

      • Where \(p(i)\) is the probability of an individual chromosome being selected

    • Some forms of evolutionary computation use this selection exclusively

    • However, in general, it’s not going to perform well as there is nothing guiding the search

      • The algorithm will not converge

      • There is low selection pressure

  2. Select the top chromosomes only and apply the genetic operators

    • As already seen, this will not perform well as the population will converge too quickly on some local optimum

    • There is high selection pressure

  • An alternative highly effective selection is to go somewhere in between these ideas

  • This is where tournament selection comes in

    • Randomly pick a subset of \(k\) chromosomes from the population

    • Select the best of these \(k\) chromosomes

../../_images/tournament_selection.png

Tournament selection being performed on a population of size 13. Here, \(k = 3\), meaning three chromosomes were selected at random. Of the three, the candidate solution with the highest fitness is then returned as the selected chromosome.

  • The value of \(k\) is typically kept low, but can be adjusted as needed

    • If \(k = 1\), this would be the same as a uniform selection

    • If \(k = \mu\), this would be the same as always selecting the best

    • As \(k\) increases, so does the selection pressure

11.3. Fitness Proportional Selection

  • There are a collection of selection strategies that are fitness proportional

    • The probability of selecting an individual depends on its fitness value compared to the whole population’s fitness

  • The simplest of these is to have the probability be the individual’s proportion of the total population fitness

    • \(p(i) = \frac{f(i)}{\sum^{\mu}_{j}f(j)}\)

    • Where \(p(i)\) is the probability of selecting individual \(i\)

    • And \(f(i)\) is the fitness of individual \(i\)

    • Note that the sum of the probabilities must be one \(\sum^{\mu}_{j=1}p(j) = 1\)

  • The benefit here is that by selecting highly fit individuals, the search may produce high quality solutions quickly

  • However, this may cause the population to get stuck in a local maximum

    • Premature convergence

    • It is important to ensure the population stays diverse

  • Further, when the population begins to converge, this strategy becomes similar to a uniform selection

    • Learning will stagnate when the fitness values of the chromosomes within the population are similar

  • Modifications to this strategy include adding some constant or windowing

    • Windowing is subtracting the minimum fitness within the population from all individuals within the population

  • Several selection probabilities are shown in the below table for some maximization problem

    • One regular, one with adding some constant, and one and windowing

  • Here, adding a constant made the lowest fit individual more likely to be selected

  • While also making the selection probabilities more similar

  • With windowing, the lowest fit individual had zero probability of being selected

  • While also making the selection probabilities more different

Modifications to Fitness Proportional Selection

\(i\)

\(f(i)\)

\(p(i)\) for \(f(i)\)

\(f(i) + 10\)

\(p(i)\) for \(f(i) + 10\)

Windowing \(p(i)\) for \(f(i)\)

A

\(1\)

\(0.1\)

\(11\)

\(0.257\)

\(0\)

B

\(4\)

\(0.4\)

\(14\)

\(0.350\)

\(0.333\)

C

\(5\)

\(0.5\)

\(15\)

\(0.375\)

\(0.666\)

Total

10

1.0

40

1.0

1.0

11.3.1. Rank Based Selection

  • An alternative is to take the same idea, but rank the candidate solutions

  • Then, have the probability of being selected related to their rank, not their fitness value

Rank Based Proportional Selection

\(i\)

\(f(i)\)

\(p(i)\) for \(f(i)\)

Ranking

\(p(i)\) for Ranking

A

\(1\)

\(0.1\)

\(0\)

\(0\)

B

\(4\)

\(0.4\)

\(1\)

\(0.333\)

C

\(5\)

\(0.5\)

\(2\)

\(0.666\)

Total

10

1.0

1.0

  • The above table shows a simple linear ranking being used, but other more complex forms may be used

  • The benefit of this strategy is that, as the population converges, selection pressure will not become lower

  • The downside is the computational overhead of ranking each candidate solution

11.3.2. Roulette Wheel Selection

  • Fitness proportional selection is commonly referred to as a roulette wheel selection

../../_images/roulette_wheel_selection.png

Visualization of roulette wheel selection. Each individual is assigned a piece of the wheel proportional to that individual’s proportion of fitness of the population’s total fitness.

11.3.3. Stochastic Universal Sampling

  • Typically, the roulette wheel has one arm and is spun \(\lambda\) times

  • An alternative is to spin a roulette wheel with \(\lambda\) arms once

    • This means the wheel is spun only once

    • This produces a diverse set of selected individuals

../../_images/stochastic_universal_sampling_selection.png

Example of a roulette wheel where \(\lambda=2\).

Warning

Although possible, fitness proportional selections often requires some alterations to work with algorithms minimizing fitness or negative fitness values.

11.4. Survivor Selection

  • With a steady state algorithm, survivor selection is needed

    • Which individuals survive and which are replaced

  • Any rules could be used, but common ones include

    • Random

      • Randomly select \(\lambda\) to replace

    • Age based

      • Replace the \(\lambda\) oldest individuals

      • Ignores fitness

      • Might kill off the best individuals

    • Fitness based

      • Replace the \(\lambda\) worst individuals based on fitness

      • Will kill off the worst candidate solutions

      • May cause the population to converge prematurely

    • Diversity based

      • Replace \(\lambda\) candidate solutions that are the most similar

      • The idea is, replace those that add little diversity to the population

    • Similarity based

      • Replace the \(\lambda\) candidate solutions with the most similar fitness to the offspring

      • Similar to diversity and easy to implement with an assumption

      • Those with a similar fitness may be similar in the genotype space

11.4.1. Elitism

  • Regardless of using a generational or steady state algorithm, elitism is commonly used in evolutionary computation

  • The idea is, always have a copy of the best \(x\) chromosomes in the population

    • Typically \(x=1\)

    • Having \(x\) too high will cause premature convergence

11.5. Diversity

  • Diversity is how much the members of the populations differ from one another

  • Diversity is important within a population

    • Balances exploration and exploitation

    • It helps to prevent premature convergence

  • As with anything with evolutionary computation, there are no rules on what should be done

  • Below are some high-level ideas, but is in no way exhaustive

    • Explicitly add a diversity measure to the fitness calculation

    • Only have similar individuals compete with one another

    • The Island Model

      • Distribute the population into sub-populations that evolve independently with periodic migrations

    • Ring Species

      • Only allow individuals to mate if they are close to one another within the population

../../_images/island_model.png

Island model layout with three sub-populations. Each of the three sub-populations evolve independently. This allows each sub-population to explore the search space along its own path, thereby preserving diversity between the sub-populations. Periodically, members of the sub-populations migrate to other sub-populations to introduce diversity to the individual sub-populations.

../../_images/ring_species.png

Ring species treats the population as a ring/circle and mating can only occur between chromosomes if they exist within some distance (number of indices) of one another. In this example, the distance \(d=3\), therefore only the candidate solutions at indices 22, 23, 24, 1, 2, 3, and 4 would be eligible for mating.

  • Further, there are no rules on where the diversity should be measured

    • Genotype space

      • How similar the chromosomes are

      • Depends on the encoding

      • Hamming distance?

      • Levenshtein distance?

    • Phenotype space

      • How different are the phenotypes (what the chromosomes represent)

      • May be very different from the genotype space’s distance

    • Algorithm space

      • Some distance based on the algorithm’s framework

      • For example, ring species and the island model

11.6. For Next Class

  • TBD