So far all of the hypothesis tests we have carried out have involved hypotheses involving a single proportion, generally corresponding to a “long run success rate” for something with two outcomes.
Quite often we have hypotheses that, in one way or another, involve an association between an explanatory and response variable.
Associations between variables can take several forms, depending on the nature of the explanatory and response variables: are they categorical or quantitative?
For now we will only consider binary categorical variables, and we won’t address the case of a quantitative explanatory variable with a categorical response variable (you’ll need to take STAT 213 for that). But the table below summarizes the common parameters of interest for other combinations.
Explanatory | Response | Parameter Type |
---|---|---|
None | Binary | Single Proportion |
None | Quantitative | Single Mean |
Binary | Binary | Difference of Proportions |
Binary | Quantitative | Difference of Means |
Quantitative | Quantitative | Correlation or Slope |
Are metal bands harmful to penguins (the kind of metal bands used for tagging wildlife, not the kind that inspires head banging)?
Researchers (Saraux et al., 2011) investigated this question with a sample of 20 penguins near Antarctica. All of these penguins had already been tagged with RFID (radio frequency identification) chips, and the researchers randomly assigned 10 of them to receive a metal band on their flippers in addition to the RFID chip. The other 10 penguins did not receive a metal band.
Researchers then kept track of which penguins survived for the 4.5-year study and which did not survive.
Outcome | No band (control) | Metal band (experimental) | Total |
---|---|---|---|
Survived | 6 | 3 | 9 |
Died | 4 | 7 | 11 |
Total | 10 | 10 | 20 |
In order to assess the strength of the evidence that the difference in the dataset reflects a meaningful difference and not just randomness, we need to ask how unlikely is it that we would see a difference this large purely by chance, if the metal bands are actually harmless?
To answer this question we turn to simulation.
As usual, we simulate outcomes from a process for which the null hypothesis is true; that is, the metal band has no effect on penguin survival.
More precisely, we assume that the 9 penguins that survived would have done so with or without the metal band, and the 11 that did not would not have survived either way.
The random part is which penguins ended up in which group (metal band or no metal band) as a result of random assignment in the experiment.
We can simulate this assignment with a deck of playing cards. Normally we would do this together in person with real, physical cards, but we’ll have to use our imagination.
You might have chosen to let red cards represent the penguins who are destined to live, and black cards to represent the penguins that are destined to die, or vice versa. If we go with “red” = “survives” and “black” = “dies”, then we will have 9 red cards and 11 black ones.
Since in the actual study there are 10 penguins in each group (control vs metal band), we could shuffle the 20 cards and deal them into two piles of 10 each. One pile would represent penguins assigned to the control condition, the other would represent penguins assigned to the metal band condition.
Let’s simulate this using R code.
set.seed(29747)
<- tibble(
Penguins Outcomes = c(
"Survived",
"Survived",
"Survived",
"Survived",
"Survived",
"Survived",
"Survived",
"Survived",
"Survived",
"Died",
"Died",
"Died",
"Died",
"Died",
"Died",
"Died",
"Died",
"Died",
"Died",
"Died"))
<- shuffle(Penguins) %>%
ShuffledCards mutate(AssignedTo = c(rep("Metal", 10), rep("Control", 10)))
head(ShuffledCards, n = 10)
## # A tibble: 10 × 3
## Outcomes orig.id AssignedTo
## <chr> <chr> <chr>
## 1 Survived 1 Metal
## 2 Survived 9 Metal
## 3 Survived 7 Metal
## 4 Died 12 Metal
## 5 Died 17 Metal
## 6 Died 14 Metal
## 7 Survived 3 Metal
## 8 Survived 2 Metal
## 9 Died 19 Metal
## 10 Died 11 Metal
Note: Your results will differ since you set your own random seed. For me, the IDs of the penguins assigned to the metal band group are 1,2,3,7,9,11,12,14,17 and 19. The remaining penguins are assigned to the control group.
Penguin # | Group | Outcome |
---|---|---|
1 | Survived | |
2 | Survived | |
3 | Survived | |
4 | Survived | |
5 | Survived | |
6 | Survived | |
7 | Survived | |
8 | Survived | |
9 | Survived | |
10 | Died | |
11 | Died | |
12 | Died | |
13 | Died | |
14 | Died | |
15 | Died | |
16 | Died | |
17 | Died | |
18 | Died | |
19 | Died | |
20 | Died |
Outcome | No band (control) | Metal band (experimental) | Total |
---|---|---|---|
Survived | 9 | ||
Died | 11 | ||
Total | 10 | 10 | 20 |
Since data with a binary explanatory and binary response variable can be reconstructed with just four numbers, as you did by entering the values in StatKey, you might not always have a full data frame to work with.
The code below builds a data frame in R from those four values.
<-
PenguinData tibble(
Group = c(rep("Control", 10), rep("Metal", 10)),
Outcome = c(rep("Survived", 6), rep("Died", 4), rep("Survived", 3), rep("Died", 7)))
First, let’s graph the proportions who died and survived within each group results using a bar graph:
gf_props(~Outcome | Group, data = PenguinData, fill = ~Outcome)
You should verify that each pair of proportions sums to 1.
To compute the difference in proportions from the real data, we can use diffprop()
<- diffprop(Outcome ~ Group, data = PenguinData) sDiffProp
To construct a randomized dataset based on the H0 that says the two variables are not associated, we can shuffle the order of the entries in the explanatory variable, and compute the difference of proportions based on the new assignment of cases to groups. Doing this several thousand times will produce a randomization distribution
<- do(5000) *
Randomization_DiffProps diffprop(Outcome ~ shuffle(Group), data = PenguinData)
diffprop
variable in the randomization distribution, and highlight the simulated values that are as or more favorable to \(H_1\) as the result from the real data (the graph should look similar to the one you made in StatKey). The only thing you should need to change is the boundary value of the difference in proportions where the bars change from pink to blue, defined by the value of minimumHighlightedDifference
<- 0
minimumHighlightedDifference ## The code below shrinks the boundary value by a tiny amount in case of numerical imprecisions,
## so that exact matches are sure to satisfy the condition
gf_histogram(
~diffprop,
data = Randomization_DiffProps,
fill = ~(diffprop >= minimumHighlightedDifference * 0.9999),
binwidth = 0.1,
xlab = "Simulated Difference in Survival Proportions (Control Minus Experimental)",
ylab = "# of Randomization Datasets") +
scale_x_continuous(breaks = seq(-1,1, by = 0.1)) +
scale_fill_discrete(
name = "At least as favorable to H1 as real result?",
breaks = c(FALSE,TRUE),
labels = c("No", "Yes"))
minimumDifferenceCountingForPValue
)<- 0
minimumDifferenceCountingForPValue <- prop(~(diffprop >= 0.9999 * minimumDifferenceCountingForPValue), data = Randomization_DiffProps) P_value_Penguins
The SleepCaffeine
dataset in the Lock5Data
package (also available in StatKey as Sleep Caffeine Words
) comes from an experiment examining memorization ability for two groups of randomly assigned college students: one group took a 90 minute nap; the other had a pill containing an amount of caffeine comparable to a cup of coffee. The response was the number of words recalled from a list the students were asked to memorize.
diffmean()
function computes a difference in means between two groups. It has the same syntax as cor()
and diffprop()
). As with the penguins, the idea behind the randomization procedure is to assume that, if condition and outcome are unrelated, individuals would have recalled however many words they would have recalled, regardless of which treatment group they had been assigned to.