Researchers in China investigated whether different kinds of exercise/activity might help to prevent brain shrinkage or perhaps even lead to an increase in brain size (Mortimer et al., 2012).
The researchers randomly assigned elderly adult volunteers into one of four activity groups: tai chi, walking, social interaction, and no intervention. Each participant had an MRI to determine brain volume before the study began and again at its end. The researchers measured the percentage change in brain volume during that time (positive values meant brain size increased, negative values meant brain size decreased).
The question of interest was whether one or more of the activities could prevent decreases and possibly even produce increases in brain size.
The cases of interest are the elderly adult volunteers.
The explanatory variable is the activity group a person was assigned to. This variable is categorical with 4 levels.
The response variable was the percent change in brain volume. This variable is quantitative.
Most hypotheses that involve an explanatory variable can be framed in terms of a possible association between the explanatory and response. In the context of this study, we might pose the following generic null and alternative hypotheses:
H0: There is no relationship between assigned activity group and change in brain volume (in other words, whatever brain volume changes occur would have occurred regardless of which activity the person was assigned to)
H1: There is a relationship between assigned activity group and change in brain volume.
The researchers likely have more specific hypotheses involving particular differences among groups, but to start with, we should investigate whether there is sufficient evidence for any differences. If there is, follow-up analyses can ask which differences we have significant evidence for.
Here is the data from the study. Since we have a quantitative response variable and a categorical explanatory variable, it is helpful to plot the distribution of the response, grouped by the explanatory variable. Either a set of boxplots or a set of histograms would be a natural choice.
library(mosaic)
<- read.file("http://colindawson.net/data/brain_size.txt") BrainVolume
gf_boxplot(BrainChange ~ Treatment, data = BrainVolume) +
scale_y_continuous(name = "% Change in Brain Volume", breaks = seq(-4, 3, by = 0.5))
gf_histogram(~BrainChange | Treatment, data = BrainVolume, binwidth = 0.5) +
scale_x_continuous(name = "% Change in Brain Volume", breaks = seq(-4,3,by = 0.5))
Here are some summary statistics about the response variable, split by group.
favstats(BrainChange ~ Treatment, data = BrainVolume)
## Treatment min Q1 median Q3 max mean sd n missing
## 1 None -2.034 -1.16875 -0.585 0.9725 2.011 -0.2401250 1.2584309 24 0
## 2 Social -1.359 0.00750 0.596 0.8060 1.796 0.4056296 0.6968969 27 0
## 3 TaiChi -1.829 0.00500 0.449 0.9870 2.201 0.4710690 0.8557466 29 0
## 4 Walking -3.470 -1.05850 -0.026 0.9710 1.833 -0.1503333 1.3868388 27 0
The treatment group could be irrelevant to the change in brain size, and any differences we see in this dataset could be due to chance. The goal of our hypothesis test will be to establish how likely we would be to see a set of means as or more spread out than these, if in fact treatment group is irrelevant.
As with the Penguin study and the LAPD traffic stops study, if the explanatory variable is irrelevant, then each participant’s change in brain volume would have been the same regardless of which group they were assigned to.
(Caution: Spoilers for Exercise 4 follow)
One randomization procedure we could use is to take the set of “% Change” values for all four groups, combine them into a single pool, and randomly reassign them to the four groups, preserving the sample sizes for each group that we had in the real data. In other words, randomly choose 24 response values to be assigned to the Control group, 27 for the Social Interaction group, 29 for the TaiChi group, and 27 for the Walking group. Then we could calculate the four sMeans that result from this grouping.
The following code performs random shufflings of the percent change in brain size variable so that the grouping is random. Then, for each random shuffling, it calculates the four means.
<- do(5000) *
RandomizedMeans mean(BrainChange ~ shuffle(Treatment), data = BrainVolume)
Let’s plot the randomization distributions for each mean separately.
gf_histogram(~None, data = RandomizedMeans) +
scale_x_continuous(
name = "Randomized Mean % Change in Brain Volume (Control)",
limits = c(-1.00, 1.00),
breaks = seq(-1.00, 1.00, by = 0.25))
gf_histogram(~Social, data = RandomizedMeans) +
scale_x_continuous(
name = "Randomized Mean % Change in Brain Volume (Social Interaction)",
limits = c(-1.00, 1.00),
breaks = seq(-1.00, 1.00, by = 0.25))
gf_histogram(~TaiChi, data = RandomizedMeans) +
scale_x_continuous(
name = "Randomized Mean % Change in Brain Volume (Tai Chi)",
limits = c(-1.00, 1.00),
breaks = seq(-1.00, 1.00, by = 0.25))
gf_histogram(~Walking, data = RandomizedMeans) +
scale_x_continuous(
name = "Randomized Mean % Change in Brain Volume (Walking)",
limits = c(-1.00, 1.00),
breaks = seq(-1.00, 1.00, by = 0.25))
If we want to measure how likely it is that we would get a set of sMeans as different from each other as the four we saw in the real study, assuming the treatment didn’t matter, we need a way to measure how different the four means are from each other with a single number, so that we can determine which simulated datasets have means that are as or more spread out than the real ones.
There are many potential test statistics that we could use to measure how spread out a set of means is. One natural choice is simply the standard deviation of the means.
The following code chunk re-runs the randomization process and calculation of four means and then calculates the standard deviation of each set of four means. We could have used the randomization dataset from above, but it is convenient to have all of the code in one place.
<- do(5000) *
RandomizationSDofMeans mean(BrainChange ~ shuffle(Treatment), data = BrainVolume) %>%
sd()
threshold
in the chunk below and then run the code to plot the randomization distribution, highlighting the test statistics that should count toward the P-value, and then calculate the P-value.<- 0
threshold gf_histogram(
~result,
data = RandomizationSDofMeans,
fill = ~(result >= threshold),
xlab = "Randomized Standard Deviation of Means") +
scale_fill_discrete(
name = "Counts Toward P-value?",
breaks = c(TRUE, FALSE),
labels = c("Yes", "No"))
<- prop(~(result >= threshold), data = RandomizationSDofMeans) Pvalue