16. What Comparing Distributions Means

  • Given two evolutionary computation instances, how does one determine which is better?

16.1. Random Variables

Warning

The content within this topic is kept at a high level and deliberately skips some formalisms and rigor. For the purpose of this course, the level of detail provided should be sufficient.

  • Consider a random variable as some phenomenon that can be observed

  • When observing the random variable, some quantitative measurement can be taken

    • The time it takes to recover from the flu

    • The amount of nuts a squirrel buries a day

    • The number of people with blond hair that walk by this building between 10am – 11am on Tuesdays

    • The best fitness value of a genetic algorithm run

  • Whatever the measurement is, they can be recorded and represented as distributions

  • Consider the following two random variables

../../_images/random_variable_1.png

Output of some random variable represented as a box. The data is shown as a list of numbers and as a distribution. A total of 50 data points were observed and recorded.

../../_images/random_variable_2.png

Output of another random variable represented as a box. The data is shown as a list of numbers and as a distribution. A total of 50 data points were observed and recorded.

  • Regardless of what these random variables are, one may wonder, are these random variables actually different?

    • If considering the flu recovery time

      • The random variables may be recovery times of two different groups of people

      • Does one group recover faster?

    • With the squirrels, does one squirrel bury more nuts in a day than the other?

    • Does one year have more blond people walking past the building on Tuesdays from 10am – 11am?

    • Does one evolutionary computation algorithm implementation perform better than another?

../../_images/random_variables_together.png

The two random variables’ measurements plotted against each other.

  • When visually analyzing the above comparison of the two random variables, there does appear to be a difference

    • But the data is clearly noisy

    • A range of values are obtained for each random variable

    • And there are only 50 observations of each random variable

  • It is difficult to say if these random variables are truly producing different distributions

    • They may effectively be the same

    • The two groups do not take different amounts of time to recover from the flu

    • The squirrel I named Jimbo does not actually bury more nuts than the squirrel I named Sir Lora

    • There really isn’t more blond people this year vs. last

    • My algorithm isn’t really performing better than the next persons’

16.2. Example — Drug Trial

  • Consider a drug trial

  • A group of 50 people is given a new drug that is intended to reduce the recovery time of the flu

  • Another group is given a placebo

  • No one in either group knows if they are given the new drug or the placebo

  • Here, the random variables are the two groups and the observations are the recovery times of the people in each group

    • Further, the recovery time is itself a random variable

    • This is the property/characteristic trying to be understood

../../_images/random_variables_same_distro.png

Distributions of group recovery times for individuals in the placebo group and drug group.

  • With the above example

    • The average recovery time for the placebo group was roughly 15.4 days

    • The average recovery time for the drug group was roughly 13.8 days

    • The drug group recovered, on average, 1.6 days faster

  • One may be tempted to conclude that the drug clearly works

  • However, the average recovery time is a summary statistic of a distribution

  • When observing the distributions, it is clear that there is more nuance

  • Additionally, there were only 50 observations from each group

  • Every individual is different and every observation is different

  • If I were to do this again, the distributions would look different

  • What are the odds that this result just happened by chance?

16.2.1. Null Hypothesis

  • Always start by assuming that there is no real difference

  • This is called the Null Hypothesis

  • It may feel like an arbitrary position to take, but with this assumption, it allows one to start to form an argument

  1. Assume the drug has no actual impact on the recovery time

  2. If the drug has no impact, then it could be expected that the average recovery time would be no different than the placebo

  3. If both groups had the placebo, it really would not have mattered which person was assigned to which group

  4. Thus, it should be possible to assign each individual and their corresponding recovery time to one of the two groups randomly

  5. Therefore, the difference in the average recovery times between the two randomly assigned groups would be an example of what one would get by dumb luck

16.2.2. Permutation/Randomization Test

  • Assuming the null hypothesis

  • Shuffling the groups once and calculating the difference between the group averages will give one example of a chance difference

  • It is possible to generate all possible combinations of assigning the 100 people to two groups

  • If all possible combinations are generated, all possible average by chance differences can be calculated

    • Although, in this example, there are a lot

    • A total of \(1.01 \times 10^{29}\) combinations of splitting 100 people into two groups of 50

    • In cases where it is intractable to generate all combinations, simply generate some large number of combinations

    • Here, \(1,000,000\) combinations will be generated

../../_images/chance_average_group_differences.png

Distribution of the “by chance” average group differences after 1,000,000 shuffles of the two groups. This distribution is not the recovery times like in the above distributions.

16.2.3. Interpreting Results

  • The key question to ask now is, how likely is it that the original observation actually happened by chance?

../../_images/chance_average_group_differences_with_observation.png

Original observed average group difference in recovery time between the drug group and the placebo group shown on the distribution of the chance average group differences. The number of chance observations with the same or lower recovery times than the originally observed will inform how likely our observation could happen by chance.

  • Of the \(1,000,000\) combinations randomly generated

  • There were \(205\) with a by chance recovery time the same or better than the originally observed 1.6 days faster

  • In other words, there is a 0.0205% chance that the original observation happened by dumb luck

    • \(\frac{205}{1000000} = 0.000205 = 0.0205\%\)

    • There is a roughly one in 50,000 chance this would have happened by dumb luck

  • This is where the idea of a probability value (p-value) comes in

  • There is a 0.000205 probability that the drug’s improved recovery time happened by chance

16.2.3.1. T-Tests and Mann-Whitney U tests

  • The permutation test is explained to provide the intuition into what p-values actually mean

  • In practice, using a t-test or Mann-Whitney U test is sufficient for this course

16.3. For Next Class