STAT 113: Comparing More Than Two Means

Learning Goals

Explore the process of hypothesis testing when the explanatory variable is a categorical variable with more than two levels and the response variable is quantitative

Exercise and Changes in Brain Size

Researchers in China investigated whether different kinds of exercise/activity might help to prevent brain shrinkage or perhaps even lead to an increase in brain size (Mortimer et al., 2012).

The researchers randomly assigned elderly adult volunteers into one of four activity groups: tai chi, walking, social interaction, and no intervention. Each participant had an MRI to determine brain volume before the study began and again at its end. The researchers measured the percentage change in brain volume during that time (positive values meant brain size increased, negative values meant brain size decreased).

The question of interest was whether one or more of the activities could prevent decreases and possibly even produce increases in brain size.

Cases and Variables

The cases of interest are the elderly adult volunteers.

The explanatory variable is the activity group a person was assigned to. This variable is categorical with 4 levels.

The response variable was the percent change in brain volume. This variable is quantitative.

Parameters and Hypotheses

What parameters might we want to focus on here? (Hint: What do we usually focus on when we have a quantitative response variable?)

SOLUTION

Most hypotheses that involve an explanatory variable can be framed in terms of a possible association between the explanatory and response. In the context of this study, we might pose the following generic null and alternative hypotheses:

H0: There is no relationship between assigned activity group and change in brain volume (in other words, whatever brain volume changes occur would have occurred regardless of which activity the person was assigned to)

H1: There is a relationship between assigned activity group and change in brain volume.

The researchers likely have more specific hypotheses involving particular differences among groups, but to start with, we should investigate whether there is sufficient evidence for any differences. If there is, follow-up analyses can ask which differences we have significant evidence for.

If there is no relationship between activity group and change in brain volume, what would that imply about the four pMeans?

SOLUTION

The Data

Here is the data from the study. Since we have a quantitative response variable and a categorical explanatory variable, it is helpful to plot the distribution of the response, grouped by the explanatory variable. Either a set of boxplots or a set of histograms would be a natural choice.

library(mosaic)
BrainVolume <- read.file("http://colindawson.net/data/brain_size.txt")

gf_boxplot(BrainChange ~ Treatment, data = BrainVolume) +
  scale_y_continuous(name = "% Change in Brain Volume", breaks = seq(-4, 3, by = 0.5))

gf_histogram(~BrainChange | Treatment, data = BrainVolume, binwidth = 0.5) +
  scale_x_continuous(name = "% Change in Brain Volume", breaks = seq(-4,3,by = 0.5))

Here are some summary statistics about the response variable, split by group.

favstats(BrainChange ~ Treatment, data = BrainVolume)

##   Treatment    min       Q1 median     Q3   max       mean        sd  n missing
## 1      None -2.034 -1.16875 -0.585 0.9725 2.011 -0.2401250 1.2584309 24       0
## 2    Social -1.359  0.00750  0.596 0.8060 1.796  0.4056296 0.6968969 27       0
## 3    TaiChi -1.829  0.00500  0.449 0.9870 2.201  0.4710690 0.8557466 29       0
## 4   Walking -3.470 -1.05850 -0.026 0.9710 1.833 -0.1503333 1.3868388 27       0

What differences do you notice between the groups in this dataset? Do these differences imply that the activities made a difference? Why or why not?

SOLUTION

A Randomization Procedure

The treatment group could be irrelevant to the change in brain size, and any differences we see in this dataset could be due to chance. The goal of our hypothesis test will be to establish how likely we would be to see a set of means as or more spread out than these, if in fact treatment group is irrelevant.

As with the Penguin study and the LAPD traffic stops study, if the explanatory variable is irrelevant, then each participant’s change in brain volume would have been the same regardless of which group they were assigned to.

Propose a randomization procedure that we could use to simulate datasets based on this null hypothesis. Keep in mind that the null hypothesis does not tell us what value the pMeans should have, only that they are the same for all four groups.

SOLUTION

(Caution: Spoilers for Exercise 4 follow)

One randomization procedure we could use is to take the set of “% Change” values for all four groups, combine them into a single pool, and randomly reassign them to the four groups, preserving the sample sizes for each group that we had in the real data. In other words, randomly choose 24 response values to be assigned to the Control group, 27 for the Social Interaction group, 29 for the TaiChi group, and 27 for the Walking group. Then we could calculate the four sMeans that result from this grouping.

The following code performs random shufflings of the percent change in brain size variable so that the grouping is random. Then, for each random shuffling, it calculates the four means.

RandomizedMeans <- do(5000) *
  mean(BrainChange ~ shuffle(Treatment), data = BrainVolume)

Let’s plot the randomization distributions for each mean separately.

gf_histogram(~None, data = RandomizedMeans) +
  scale_x_continuous(
    name   = "Randomized Mean % Change in Brain Volume (Control)", 
    limits = c(-1.00, 1.00),
    breaks = seq(-1.00, 1.00, by = 0.25))
gf_histogram(~Social, data = RandomizedMeans) +
  scale_x_continuous(
    name   = "Randomized Mean % Change in Brain Volume (Social Interaction)", 
    limits = c(-1.00, 1.00),
    breaks = seq(-1.00, 1.00, by = 0.25))    
gf_histogram(~TaiChi, data = RandomizedMeans) +
  scale_x_continuous(
    name   = "Randomized Mean % Change in Brain Volume (Tai Chi)", 
    limits = c(-1.00, 1.00),
    breaks = seq(-1.00, 1.00, by = 0.25))    
gf_histogram(~Walking, data = RandomizedMeans) +
  scale_x_continuous(
    name   = "Randomized Mean % Change in Brain Volume (Walking)", 
    limits = c(-1.00, 1.00),
    breaks = seq(-1.00, 1.00, by = 0.25))

How do the randomization distributions of the four means compare? Is this surprising?

SOLUTION

A Test Statistic

If we want to measure how likely it is that we would get a set of sMeans as different from each other as the four we saw in the real study, assuming the treatment didn’t matter, we need a way to measure how different the four means are from each other with a single number, so that we can determine which simulated datasets have means that are as or more spread out than the real ones.

Propose a way to measure how spread out a given set of four means is. This measure serves as a test statistic. Calculate this value for the set of four means from the real data. There are many possible answers that would make sense!

SOLUTION

If you simulated several thousand datasets based on the randomization procedure described above, you could calculate your test statistic for each one of those datasets. What range of values of your statistic would count as being “as or more spread out” than the real result?

SOLUTION

Finding a P-value

There are many potential test statistics that we could use to measure how spread out a set of means is. One natural choice is simply the standard deviation of the means.

The following code chunk re-runs the randomization process and calculation of four means and then calculates the standard deviation of each set of four means. We could have used the randomization dataset from above, but it is convenient to have all of the code in one place.

RandomizationSDofMeans <- do(5000) *
    mean(BrainChange ~ shuffle(Treatment), data = BrainVolume) %>%
    sd()

Calculate the standard deviation of the means for the real data. What range of standard deviations of means from the randomization distribution should we count toward our P-value?

SOLUTION

Replace the value of the threshold in the chunk below and then run the code to plot the randomization distribution, highlighting the test statistics that should count toward the P-value, and then calculate the P-value.

SOLUTION

threshold <- 0
gf_histogram(
  ~result,
  data = RandomizationSDofMeans,
  fill = ~(result >= threshold),
  xlab = "Randomized Standard Deviation of Means") +
  scale_fill_discrete(
    name = "Counts Toward P-value?", 
    breaks = c(TRUE, FALSE),
    labels = c("Yes", "No"))

Pvalue <- prop(~(result >= threshold), data = RandomizationSDofMeans)

What is the conclusion of the hypothesis test in context?

STAT 113: Comparing More Than Two Means

Your Name Here

Learning Goals

Exercise and Changes in Brain Size

Cases and Variables

Parameters and Hypotheses

SOLUTION

SOLUTION

The Data

SOLUTION

A Randomization Procedure

SOLUTION

SOLUTION

A Test Statistic

SOLUTION

SOLUTION

Finding a P-value

SOLUTION

SOLUTION

SOLUTION