STAT 113 Lab 9: Hypothesis Tests to Compare Two Groups

Overview

So far all of the hypothesis tests we have carried out have involved hypotheses involving a single proportion, generally corresponding to a “long run success rate” for something with two outcomes.

Quite often we have hypotheses that, in one way or another, involve an association between an explanatory and response variable.

Associations between variables can take several forms, depending on the nature of the explanatory and response variables: are they categorical or quantitative?

For now we will only consider binary categorical variables, and we won’t address the case of a quantitative explanatory variable with a categorical response variable (you’ll need to take STAT 213 for that). But the table below summarizes the common parameters of interest for other combinations.

Parameter Type by Type of Explanatory and Response Variable

Explanatory	Response	Parameter Type
None	Binary	Single Proportion
None	Quantitative	Single Mean
Binary	Binary	Difference of Proportions
Binary	Quantitative	Difference of Means
Quantitative	Quantitative	Correlation or Slope

Learning Goals of This Lab

Carry out hypothesis tests involving both an explanatory and response variable from start to end, dentifying the components of the test along the way
See how to create a randomization distribution, both in StatKey and in R, when the null hypothesis can be seen as saying that there is no association between the variables

Penguins and Metal Bands

Are metal bands harmful to penguins (the kind of metal bands used for tagging wildlife, not the kind that inspires head banging)?

Researchers (Saraux et al., 2011) investigated this question with a sample of 20 penguins near Antarctica. All of these penguins had already been tagged with RFID (radio frequency identification) chips, and the researchers randomly assigned 10 of them to receive a metal band on their flippers in addition to the RFID chip. The other 10 penguins did not receive a metal band.

Researchers then kept track of which penguins survived for the 4.5-year study and which did not survive.

Identifying Components

Identify the explanatory and response variables and classify each as categorical or quantitative.

Response

Why did the researchers include a comparison group in this study? Why not just see how many penguins survived while wearing a metal band?

Response

Is this an observational study or an experiment? Explain how you know.

Response

Setting up the Hypothesis Test

What parameter(s) are of interest, and what is/are the corresponding statistic(s)? (Hint: if you’re not sure you might want to refer to the table in the Overview section above which lists typical parameters/statsitics based on the types of variables we have). Make sure you are clear about the scope (set of cases) that the parameter(s) and statistic(s) characterize.

Response

State the null and alternative hypotheses, first as qualitative (verbal) claims and then as equations/inequalities involving the parameter(s).

Response

List the possible values of the statistic for this study, in order from “most supportive of the alternative hypothesis” to “least supportive of the alternative hypothesis”.

Response

The researchers found that 9 of the 20 penguins survived, of whom 3 had a metal band and 6 did not. Organize these results into the following contingency table.

Results (Real Data)

Outcome	No band (control)	Metal band (experimental)	Total
Survived	6	3	9
Died	4	7	11
Total	10	10	20

Calculate the proportion who survived in each group. Also calculate the difference between these proportions (subtracting the “metal band” proportion from the “control” proportion). This difference is the statistic of interest, and serves as an estimate of the increase in the risk of death resulting from the metal bands.

Response

Did the “metal band” group have a smaller proportion that survived than the control group in this dataset?

Response

Is it possible that this difference could have happened even if the metal band had no effect; that is, simply due simply to the random nature of assigning penguins to groups (i.e., the luck of the draw)?

Response

A Randomization Scheme

In order to assess the strength of the evidence that the difference in the dataset reflects a meaningful difference and not just randomness, we need to ask how unlikely is it that we would see a difference this large purely by chance, if the metal bands are actually harmless?

To answer this question we turn to simulation.

As usual, we simulate outcomes from a process for which the null hypothesis is true; that is, the metal band has no effect on penguin survival.

More precisely, we assume that the 9 penguins that survived would have done so with or without the metal band, and the 11 that did not would not have survived either way.

The random part is which penguins ended up in which group (metal band or no metal band) as a result of random assignment in the experiment.

We can simulate this assignment with a deck of playing cards. Normally we would do this together in person with real, physical cards, but we’ll have to use our imagination.

Suppose you had a standard deck of playing cards. Select a group of cards to represent the penguins in this study. We only need to use the colors of the cards, so ignore the values on the cards. How many cards do we need to represent the penguins in the study? What should the colors represent? How many cards will you use of each color? (Hint: If each card is a penguin, then the color is permanently attached to that penguin. Because we are constructing a simulation based on a process in which the null hypothesis holds, we are assuming that some penguins would have survived regardless of group assignment, and others would not have survived regardless of group assignment.)

Response

Describe how you could use these cards to create one simulated run of the experiment where penguins are randomly assigned into groups.

Response

Simulating Dealing Cards Into Piles

You might have chosen to let red cards represent the penguins who are destined to live, and black cards to represent the penguins that are destined to die, or vice versa. If we go with “red” = “survives” and “black” = “dies”, then we will have 9 red cards and 11 black ones.

Since in the actual study there are 10 penguins in each group (control vs metal band), we could shuffle the 20 cards and deal them into two piles of 10 each. One pile would represent penguins assigned to the control condition, the other would represent penguins assigned to the metal band condition.

Let’s simulate this using R code.

Let’s number the 20 penguin cards from 1 to 20, and let 1-9 represent the “red” cards, which are the penguins “destined” to survive, with 10-20 representing the penguins “destined” not to make it. The code chunk below will generate a list of 10 penguin IDs, randomly assigned to the metal band group. Set the seed to something unique to you and then run the code chunk.

Code

set.seed(29747)
Penguins <- tibble(
    Outcomes   = c(
      "Survived",
      "Survived",
      "Survived",
      "Survived",
      "Survived",
      "Survived",
      "Survived",
      "Survived",
      "Survived",
      "Died",
      "Died",
      "Died",
      "Died",
      "Died",
      "Died",
      "Died",
      "Died",
      "Died",
      "Died",
      "Died"))
ShuffledCards <- shuffle(Penguins) %>%
  mutate(AssignedTo = c(rep("Metal", 10), rep("Control", 10)))
head(ShuffledCards, n = 10)

## # A tibble: 10 × 3
##    Outcomes orig.id AssignedTo
##    <chr>    <chr>   <chr>     
##  1 Survived 1       Metal     
##  2 Survived 9       Metal     
##  3 Survived 7       Metal     
##  4 Died     12      Metal     
##  5 Died     17      Metal     
##  6 Died     14      Metal     
##  7 Survived 3       Metal     
##  8 Survived 2       Metal     
##  9 Died     19      Metal     
## 10 Died     11      Metal

Using the assignment of penguins to groups produced above, fill out the following “data frame” table using “M” for “Metal Band” and “C” for “Control”

Results (Randomized Data)

Note: Your results will differ since you set your own random seed. For me, the IDs of the penguins assigned to the metal band group are 1,2,3,7,9,11,12,14,17 and 19. The remaining penguins are assigned to the control group.

Penguin #	Group	Outcome
1		Survived
2		Survived
3		Survived
4		Survived
5		Survived
6		Survived
7		Survived
8		Survived
9		Survived
10		Died
11		Died
12		Died
13		Died
14		Died
15		Died
16		Died
17		Died
18		Died
19		Died
20		Died

Now, summarize the results above by creating a contingency table like the one from the real data; but this one represents our simulated dataset, generated by a process where the two variables have nothing to do with each other.

Results (Randomized Data)

Outcome	No band (control)	Metal band (experimental)	Total
Survived			9
Died			11
Total	10	10	20

Calculate the proportion who survived in each group in your simulated dataset as generated by a process where the two variables are unrelated. Also calculate the difference between these proportions (subtracting the “metal band” proportion from the “control” proportion). This difference is one instance of the simulated statistic.

Response

HOMEWORK

Scaling Up the Simulation (StatKey)

Go to StatKey and select “Randomization Test For a Difference in Proportions”
Select “Edit Data”
For “Group 1 count” enter the number of penguins who survived in the control group in the real data
For “Group 1 sample size” enter the total number of penguins in the control group in the real data
Fill in the corresponding values for Group 2, representing the metal band group.
First, click “Generate 1 Sample” to produce a contingency table for one randomization dataset. Verify that the value of the statistic was computed as you computed yours (though the results will differ)
Then, generate 10000 randomization datasets

What proportion of the simulated differences in proportions appeared at least as favorable to \(H_1\) as the actual difference in proportion from the real data?

Results

Does the evidence that metal bands reduce penguin survival chances rise to the level of statistical significance at a significance level of 0.05?

Results

Repeating the Simulation in R

Since data with a binary explanatory and binary response variable can be reconstructed with just four numbers, as you did by entering the values in StatKey, you might not always have a full data frame to work with.

The code below builds a data frame in R from those four values.

PenguinData <-
  tibble(
    Group = c(rep("Control", 10), rep("Metal", 10)),
    Outcome = c(rep("Survived", 6), rep("Died", 4), rep("Survived", 3), rep("Died", 7)))

First, let’s graph the proportions who died and survived within each group results using a bar graph:

gf_props(~Outcome | Group, data = PenguinData, fill = ~Outcome)

You should verify that each pair of proportions sums to 1.

To compute the difference in proportions from the real data, we can use diffprop()

sDiffProp <- diffprop(Outcome ~ Group, data = PenguinData)

To construct a randomized dataset based on the H0 that says the two variables are not associated, we can shuffle the order of the entries in the explanatory variable, and compute the difference of proportions based on the new assignment of cases to groups. Doing this several thousand times will produce a randomization distribution

Randomization_DiffProps <- do(5000) *
  diffprop(Outcome ~ shuffle(Group), data = PenguinData)

Graph the diffprop variable in the randomization distribution, and highlight the simulated values that are as or more favorable to \(H_1\) as the result from the real data (the graph should look similar to the one you made in StatKey). The only thing you should need to change is the boundary value of the difference in proportions where the bars change from pink to blue, defined by the value of minimumHighlightedDifference

minimumHighlightedDifference <- 0
## The code below shrinks the boundary value by a tiny amount in case of numerical imprecisions,
## so that exact matches are sure to satisfy the condition
gf_histogram(
  ~diffprop, 
  data     = Randomization_DiffProps,
  fill     = ~(diffprop >= minimumHighlightedDifference * 0.9999),
  binwidth = 0.1,
  xlab     = "Simulated Difference in Survival Proportions (Control Minus Experimental)",
  ylab     = "# of Randomization Datasets") +
  scale_x_continuous(breaks = seq(-1,1, by = 0.1)) +
  scale_fill_discrete(
    name   = "At least as favorable to H1 as real result?",
    breaks = c(FALSE,TRUE),
    labels = c("No", "Yes"))

To calculate the actual P-value, modify the code below by changing the threshold to which the simulated proportions are being compared (minimumDifferenceCountingForPValue)

minimumDifferenceCountingForPValue <- 0
P_value_Penguins <- prop(~(diffprop >= 0.9999 * minimumDifferenceCountingForPValue), data = Randomization_DiffProps)

Is the evidence of harm statistically significant at the 0.05 significance level?

Response

Testing a Difference of Means: Sleep vs Caffeine for Short-Term Memory

The SleepCaffeine dataset in the Lock5Data package (also available in StatKey as Sleep Caffeine Words) comes from an experiment examining memorization ability for two groups of randomly assigned college students: one group took a 90 minute nap; the other had a pill containing an amount of caffeine comparable to a cup of coffee. The response was the number of words recalled from a list the students were asked to memorize.

Load the data from the Lock5Data package and plot the numbers of words recalled by each individual, grouped by condition.

Code

Carry out a hypothesis test to ask whether there is a difference in average recall ability (as summarized by the mean number of words recalled) between college students who take a 90 minute nap and college students who take a cup of coffee’s worth of caffeine in pill form. State the hypotheses, compute the sample statistic, construct and plot the randomization distribution, compute the \(P\)-value, and interpret the results in context. (Hint: the diffmean() function computes a difference in means between two groups. It has the same syntax as cor() and diffprop()). As with the penguins, the idea behind the randomization procedure is to assume that, if condition and outcome are unrelated, individuals would have recalled however many words they would have recalled, regardless of which treatment group they had been assigned to.

STAT 113 Lab 9: Hypothesis Tests to Compare Two Groups

Your Name Here

Overview

Parameter Type by Type of Explanatory and Response Variable

Learning Goals of This Lab

Penguins and Metal Bands

Identifying Components

Response

Response

Response

Setting up the Hypothesis Test

Response

Response

Response

Results (Real Data)

Response

Response

Response

A Randomization Scheme

Response

Response

Simulating Dealing Cards Into Piles

Code

Results (Randomized Data)

Results (Randomized Data)

Response

HOMEWORK

Scaling Up the Simulation (StatKey)

Results

Results

Repeating the Simulation in R

Response

Testing a Difference of Means: Sleep vs Caffeine for Short-Term Memory

Code

Response/Code