STAT 113: Two Groups Vs Matched Pairs Design

In baseball after a player gets a hit, they need to decide whether to stop at first base, or try to stretch their hit from a single to a double. Does the path that they take to round first base make much of a difference in how quickly they can get to second?

For example, is it better to take a narrow angle (minimizing the distance) or a wide angle (for improved turn speed) around first base? (See Figure 1 for a schematic illustration)

Figure 1: Two methods of rounding first base when a player plans to run to second base

This exploration is based on an actual study reported in a master’s thesis by W. F. Woodward in 1970.

Systematic and Random Influences on the Statistic

One reasonable experimental design that could be used to explore this question would be to:

recruit 20 players at random, from the player population of interest (such as the Major Leagues)
randomly assign half of the players to run with the wide angle, and the other 10 to run with the narrow angle
record the times for each player

Some runners are faster than others. How does random assignment control for this, so that player speed is not a confounding variable for the purposes of investigating differences in running method?

Response to Exercise 1

In a study like this, the response variable would be the time taken to reach the goal location (such as second base), while the explanatory variable is running method (wide or narrow).

A parameter we might want to focus on is

\[\mu_{wide} - \mu_{narrow}\] the difference between the average time it takes players (in the population, on average) to get from home plate to second base using the wide method (\(\mu_{wide}\)) and the average time it takes using the narrow method (\(\mu_{narrow}\)).

The corresponding statistic,

\[\bar{x}_{wide} - \bar{x}_{narrow}\] is the difference in mean times for the specific instances of running the bases by the specific players participating in the experiment.

This sample statistic can be influenced by many factors; some are systematic, some random. A possible systematic factor is any inherent superiority of one method over the other. Other systematic factors are things like player skill at base running.

By using random assignment of player to method, we can eliminate any systematic factors having to do with the players themselves. Since each player has an equal chance of being assigned to the narrow method or the wide method, regardless of their characteristics, the two groups will be evenly matched on average across all possible random assignments into groups.

However, for any particular way to assign players to groups, we will likely wind up with one group having slightly faster runners than the other — that is, even though we have removed systematic differences between groups, there is still a random influence of player speed.

There are other random factors as well that might influence the results as well, such as tiredness, wind, basepath condition, etc. These factors aren’t associated with the players themselves, but rather with the situation or the environment.

We can’t eliminate the influence of all possible random factors. However, we can eliminate some of them. Can you think of a different way of conducting the experiment that would remove the random influence of intrinsic player baserunning skill on the difference of means?

Response to Exercise 2

Letting Each Person Serve as Their Own Control Group

(Warning, Spoilers for Exercise 2 follow)

One thing we could do is have each runner use both base running angles.

That way, our two sets of times come from the same set of players, and any difference we see has to be due either to the difference in method, or to other non-player random factors like tiredness, wind speed, etc.

If we wanted to test whether these two methods differ, we would set about asking:

How likely it is that random factors alone would produce a difference as large as the one in our data?

Consider the two designs described: one involving two separate groups of players, one assigned to each method, and the other involving a single group, with each player performing one method and then the other. How would you expect the amount of random variation in the difference in means (that is, the variability of the statistic across different possible datasets collected using a given method – i.e., the standard error) to differ between the two methods?

Response to Exercise 3

Suppose the difference in sample means is 0.1 seconds in favor of the wide method. If this result came from the “one group, two ways” design, does this provide stronger, weaker, or the same amount of evidence that the wide method works better on average, than if the result came from the two groups design? Why?

Response to Exercise 4

Suppose we had each player run using the wide angle method first, and then the narrow angle method. What problem does this design have?

Response to Exercise 5

In the real study, each player used both methods, with a rest in between, and the order was randomly assigned separately for each player.

The Data

The data below is the time in seconds that it took each player to get from the point 35 feet past home to the point 15 feet before second base using each method.

Player	Narrow Time	Wide Time
1	5.50	5.55
2	5.70	5.75
3	5.60	5.50
4	5.50	5.40
5	5.85	5.70
6	5.55	5.60
7	5.40	5.35
8	5.50	5.35
9	5.15	5.00
10	5.80	5.70
11	5.20	5.10
12	5.55	5.45
13	5.35	5.45
14	5.00	4.95
15	5.50	5.40
16	5.55	5.50
17	5.55	5.35
18	5.50	5.55
19	5.45	5.25
20	5.60	5.40
21	5.65	5.55
22	6.30	6.25

Below is a pair of dotplots showing the times (you do not need to follow the plotting code, just run it)

Based on the plot, does it seem like there is clear evidence that one method is faster than the other? How are you deciding?

Response to Exercise 6

Considering the paired nature of this data, it may make more sense to comparing the running times in a way that preserves the pairing between one data point and another.

One way to do this would be to compute a difference score for each player, subtracting their individual “narrow” time from their “wide” time.

If we did this, then we could reframe our question to be about

\[\mu_{diff}\]

the mean difference score we would expect to see in the population of players, and estimate this value using

\[\bar{x}_{diff}\]

the mean difference score for the 22 players in the data.

Below, the dot plots are shown again, but now with lines connecting the pairs of times from each player.

Do you notice any pattern to the gray lines? (Hint: Look at the angles.) Does this change your subjective sense of how likely it is that random factors alone would produce this dataset?

Response to Exercise 7

Inferences About a Mean of Differences

By focusing on the sample of differences for each player (taking the time for the wide method minus the time for the narrow method), we switch our focus from a difference of means parameter (\(\mu_{wide} - \mu_{narrow}\)) to a single mean parameter, \(\mu_{diff}\), which is based on a set of difference scores.

State the relevant null and alternative hypotheses in terms of this single mean parameter.

Response to Exercise 8

Since we have reframed the inference question to focus on a single mean, we can apply our techniques for calculating a confidence interval and carrying out a hypothesis test involving one mean.

The code chunk below creates a new column containing the difference scores.

BaseballDataWide <- BaseballDataWide %>%
  mutate(Difference = wide - narrow)

The difference scores are plotted below.

gf_dotplot(~Difference, data = BaseballDataWide, binwidth = 0.025) +
  scale_y_continuous(name = NULL, breaks = NULL) +
  scale_x_continuous(name = "Difference of Times", breaks = seq(-0.2, 0.1, by = 0.05))

Recall that the estimated standard error of a mean is

\[\widehat{SE} = \sqrt{s^2 / n}\]

where \(s\) is the standard deviation of the values of the response variable in the data, and \(n\) is the sample size (the number of values in the data).

The following code chunk calculates and stores the mean (\(\bar{x}_{diff}\)), standard deviation (\(s_{diff}\)), and sample size (\(n_{diff}\)) of the set of difference scores.

sMean_Difference <- 
  mean(~Difference, data = BaseballDataWide) %>% round(digits = 3)
sSD_Difference   <- 
  sd(~Difference, data = BaseballDataWide) %>% round(digits = 3)
n_Differences <- 
  nrow(BaseballDataWide)

Calculate the standard error of the mean difference using the above quantities.

Code and Response

Find the values of the standardized endpoints of a 95% confidence interval of the mean difference. Use a \(t\)-distribution with \(n_{diff} - 1\) degrees of freedom (do we want pdist() or qdist() for this?)

Code and Response

Find and interpret the 95% confidence interval.

Code and Response

Find the test statistic, by converting the sample mean difference into a \(z\)-score in the context of the sampling distribution of mean differences that would exist if \(H_0\) were correct. (What should the mean of this sampling distribution be?)

Code and Response

Use the test statistic to find the \(P\)-value, approximating the sampling distribution of mean differences with a \(t\) distribution with \(n_{diff} - 1\) degrees of freedom. (Do we want pdist() or qdist() here?)

Code and Response

Give the conclusion of the hypothesis test in context.

Response

HOMEWORK: Comparison to Two Groups Design

Suppose we had obtained the same times, but from a two-groups design instead. This is represented by the BaseballLong data frame, where method is the explanatory variable and time is the response. The following are the means and standard deviations of the two sets of times considered as separate groups, as well as the difference in means. Note that the difference of means is identical in value to the mean of differences that we calculated above.

separateMeans_Time <- mean(time ~ method, data = BaseballLong) %>% round(3)
separateMeans_Time

## narrow   wide 
##  5.534  5.459

separateSDs_Time <- sd(time ~ method, data = BaseballLong) %>% round(3)
separateSDs_Time

## narrow   wide 
##  0.260  0.273

diffMeans_Time <- diffmean(time ~ method, data = BaseballLong)
diffMeans_Time

## diffmean 
##   -0.075

There are 22 observations in each group (\(n_{wide} = n_{long} = 22\))

Recall that the standard error formula for a difference in means is

\[\hat{SE} = \sqrt{s^2_A / n_A + s^2_B / n_B}\]

Find the standard error based on this design. How does it compare to the standard error for the paired design? Explain why that makes sense.

Code and Response

Find and interpret a 95% confidence interval for \(\mu_{wide} - \mu_{long}\) using the appropriate \(t\)-distribution approximation for the sampling distribution. How does the width compare to the width of the confidence interval for \(\mu_{diff}\)? Explain why that makes sense.

Code and Response

Test the null hypothesis that \(\mu_{wide} - \mu_{long}\) against a two-tailed alternative, and interpret the results in context. How does the outcome compare to the test involving the same numbers where we took the paired nature of the data into account? Explain why that makes sense.

STAT 113: Two Groups Vs Matched Pairs Design

Your Name Here

Systematic and Random Influences on the Statistic

Response to Exercise 1

Response to Exercise 2

Letting Each Person Serve as Their Own Control Group

Response to Exercise 3

Response to Exercise 4

Response to Exercise 5

The Data

Response to Exercise 6

Response to Exercise 7

Inferences About a Mean of Differences

Response to Exercise 8

Code and Response

Code and Response

Code and Response

Code and Response

Code and Response

Response

HOMEWORK: Comparison to Two Groups Design

Code and Response

Code and Response

Code and Response