Often we are interested in assessing the strength of evidence for or against a claim.
In the null hypothesis testing framework, we adopt a “skeptical stance”, going into the study with the mindset that theres “nothing interesting to discover”.
The description of the population/process/phenomenon in which there is “nothing interesting to discover” is called the null hypothesis (abbreviated H0).
The description of the population/process/phenomenon in which there is something interesting to discover is called the alternative hypothesis (abbreviated H1).
To assess how strong the evidence is that there is something there (a relationship between two variables, a phenomenon more interesting than “random guessing”, etc.) we do the following:
Decide on a parameter to focus on (such as a mean, a proportion, correlation, difference in means, etc.), which characterizes the population/process/phenomenon of interest
State null and alternative hypotheses about that parameter
Decide on a corresponding statistic that we will calculate from our data
Order the possible values of the statistic from “most supportive of the research claim” (H1) to “l east supportive of the research claim”
Assign weights to each possible value which sum to 1, based on the skeptic’s model of the world, in which there is “nothing interesting to discover” (that is, in which H0 is accurate)
Calculate the actual value of the statistic from the data
Find the combined weight of all of the possible values of the statistic which were ranked ahead of or equivalent to the calculated value
This combined weight is called the “P-value” and is one measure of the evidence about the hypotheses
Note: In all of the descriptions that follow, I will use the terms “male” and “female” to indicate a gender self-identification. In the study in question, the participants were male-identifying adults (“males”) who were in romantic relationships with a female-identifying adult (“female”).
Eighteen males in romantic relationships with female partners were recruited for a study whose goal was to assess the ability of male partners to recognize their romantic partner by touching their hand, without visual input.
The males were blindfolded, and asked to touch the backs of the hands of each of three female adults, one of whom was the participant’s romantic partner. The two “decoys” were the same age, height, and weight as the participant’s partner.
Each male was then asked to identify which hand belonged to their partner. Each of the 18 responses was coded as either “correct” or “incorrect”.
The question of interest is: How much evidence does the data provide that members of the target population can identify their partner by touching their hand?
The person sharing their screen in your group will play the role of the 18 blindfolded participants for this game.
We are going to set up the game such that there is no way for the blindfolded participants to know the correct answer; that is we are creating a simulation in which the null hypothesis is definitely correct – that is to say that the participant definitely is just guessing randomly.
We will label the three “hands” as 1, 2 and 3. For each participant, the partner’s hand (the correct answer) will be chosen randomly from these three individuals.
In the middle column below, write down a list of 18 numbers, each either 1, 2, or 3.
| Trial | Guess | Correct? | 
|---|---|---|
| 1 | ||
| 2 | ||
| 3 | ||
| 4 | ||
| 5 | ||
| 6 | ||
| 7 | ||
| 8 | ||
| 9 | ||
| 10 | ||
| 11 | ||
| 12 | ||
| 13 | ||
| 14 | ||
| 15 | ||
| 16 | ||
| 17 | ||
| 18 | 
Someone who is not sharing their screen should do this part. If you have three people in the group, one person can do 1-9 and the other 10-18.
Set a unique seed, and then run the code chunk below to create the list of correct answers.
set.seed(1)
sample.int(3, size = 18, replace = TRUE)Once the guesser is done picking their numbers, tell them which ones were correct, and then both of you should fill in the table.
Now switch roles and repeat the experiment. Whoever is the guesser now should fill in their table above, and then their partner(s) should run the code above (after changing the seed), and let the guesser know which responses were correct.
Nothing to Write
fill = ~(name == "Colin")” to put your name instead (as you entered it in the Google Form) and then Run the code chunk below to see everyone’s responses. As usual for these Google Form plots, I don’t expect you to know how to write code like what is in this chunk.url <- "https://docs.google.com/spreadsheets/d/e/2PACX-1vTX3fxiRcUbZyR5lUzFJQH82yXK6ykwUpQDeG_wil0dUx5nET-ir5fkseM_7f0fC7g1AQgxEVWlEcqM/pub?gid=2039833441&single=true&output=csv"
download.file(url, destfile = "love-is-blind-randomization.csv")
ClassRandomizationDistribution_LoveIsBlind <- read.file("love-is-blind-randomization.csv")
R <- nrow(ClassRandomizationDistribution_LoveIsBlind)
## Replace my name with yours in fill = (Name == "Colin"). This
## will color your dot a different color than the others.
gf_dotplot(
    ~rProportion, 
    data = ClassRandomizationDistribution_LoveIsBlind, 
    fill = ~(Name == "Colin"), 
    binwidth = 1/36,
    method = "histodot",
    ylab = "Number of Simulations") +
  scale_x_continuous(
    name   = "Simulated Proportion Correct by Random Guessing (rProportion)",
    limits = c(0,1),
    breaks = seq(from = 0, to = 1, by = 1/18) %>% round(digits = 2),
    labels = seq(from = 0, to = 1, by = 1/18) %>% round(digits = 2)) +
  scale_y_continuous(
    name   = "# of Sets of 18 Simulated Participants",
    breaks = NULL,
    labels = NULL
    )Nothing to Write
In the real study 8 out of 18, or about 44% of the participants identified their partner correctly.
Note: Many people will have answered this question before all of the simulations from the class had been completed, so answers will vary.
Go to StatKey and select “Sampling Distributions (Proportion)”
Select “Edit Proportion” and enter the hypothetical “long run” proportion of correct responses according to \(H_0\).
Select the appropriate sample size by setting \(n\).
Simulate several thousand datasets based on fully random guessing.
Select either the left-tail, right-tail or two-tail checkbox to highlight the outcomes that would appear to most strongly favor the alternative hypothesis.
Change the value below the x-axis to the actual result from the real study (not your simulation) so that the simulations which produced results as or more favorable to the alternative as the actual result are highlighted in red.
Constructing a randomization distribution in R is similar to constructing a sampling or bootstrap distribution: we generate a few thousand datasets from some process, and for each dataset, compute the statistic of interest.
n= and prob= are set based on the data and hypotheses from the tea-tasting experiment. First replace the seed with a choice of your own. Then, replace the value of the n= argument with the sample size (that is, the number of guesses made in the “Love is Blind” study) associated with each simulated dataset, and replace the value of the prob= argument with the hypothetical long-run success rate in the Love is Blind study according to the skeptic (H0).set.seed(78)
RandomizationDistribution_LoveIsBlind <- 
  do(5000) *                     ## We simulate 5000 taste tests
      rflip(n = 10, prob = 1/2)  ## each simulated taste test involves 10 coin flips with success chance 1/2prop variable in the randomization distribution you just created. In the code chunk below, the bin width and color-coding cutoff are chosen based on the tea-tasting experiment. Change the value of binwidth= to the difference in proportions corresponding to one additional correct guess (this is the smallest possible distinction for this study and sample size), and change the value 8/10 in the fill= argument to the actual value of the statistic in the real study so that the P-value corresponds to the proportion of the area shaded in blue.gf_histogram(
  ~prop, 
  data     = RandomizationDistribution_LoveIsBlind,
  fill     = (~prop >= 9/10),
  binwidth = 1/10,
  xlab     = "Simulated Proportion Correct by Random Guessing (rProportion)",
  ylab     = "# of Sets of 18 Simulated Participants")prop function is being used in two ways: the inner one checks for each simulation whether the proportion of correct guesses in that simulation was at least as good as the real one. The outer one finds the proportion of simulations that meet this criterion.P_value_LoveIsBlind <- prop(~(prop >= 9/10), data = RandomizationDistribution_LoveIsBlind)