Exploring Familywise Error

ANOVA Models and Pairwise Comparisons of Means

Let’s do an experiment to examine what happens when we do lots of comparisons between pairs of means in the setting of a predictor variable with lots of categories.

First, let’s suppose that we have a categorical predictor variable with 12 levels (that is, our data comes from 12 groups), and a quantitative response variable.

In reality, the predictor is unrelated to the response.

In other words, in the population / in the long run, the mean of the response variable is the same regardless of which of the twelve groups the case is in.

However, when conducting a study, we don’t know this reality going in, and want to use the data to learn about possible differences among the groups. That is, we want to test against a null hypothesis that says that the 12 population/long run means are identical.

We collect data from this population, with 20 cases in each of the 12 groups.

Generating Simulated Data from a Known Population Model

The following R code will create a simulated dataset from the reality described above.

## First we set a random seed for reproducibility
set.seed(47)
## Then we generate 12 sets of 20 observations, all from a common N(100,15) distribution
FakeData <- tibble(
  Group    = rep(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L"), each = 20),
  Response = rnorm(20 * 12, mean = 100, sd = 15))

Let’s take a look at the dataset to verify that it looks as expected.

FakeData

## # A tibble: 240 x 2
##    Group Response
##    <chr>    <dbl>
##  1 A        130. 
##  2 A        111. 
##  3 A        103. 
##  4 A         95.8
##  5 A        102. 
##  6 A         83.7
##  7 A         85.2
##  8 A        100. 
##  9 A         96.2
## 10 A         78.0
## # … with 230 more rows

Descriptive Visualization and Statistics

We can plot the data by group using side-by-side box plots, adding the means to the plot, and get quantitative summary statistics from the dataset.

gf_boxplot(Response ~ Group, data = FakeData) +
  stat_summary(fun = mean, color = "darkorange")

We can also examine a table of summary statistics, and arrange the groups in order of the sample means:

favstats(Response ~ Group, data = FakeData) %>% 
  format(digits = 4) %>%
  arrange(mean)

##    Group   min    Q1 median    Q3   max   mean    sd  n missing
## 2      B 65.16 85.48 100.21 106.5 113.3  95.13 14.53 20       0
## 1      A 72.58 88.63  99.81 103.9 129.9  98.45 13.66 20       0
## 12     L 64.89 91.77 100.86 106.5 119.1  98.91 14.38 20       0
## 10     J 70.55 88.83 100.71 108.1 134.6  99.21 15.32 20       0
## 5      E 69.46 91.28 100.25 114.0 121.2 100.16 14.00 20       0
## 7      G 67.53 92.23  98.12 114.0 129.8 100.76 15.58 20       0
## 3      C 72.55 92.44  98.03 114.3 129.8 101.68 14.84 20       0
## 9      I 74.38 88.56 103.26 114.7 134.5 102.54 17.58 20       0
## 11     K 68.91 97.35 104.20 114.2 130.9 104.82 17.32 20       0
## 6      F 85.63 96.75 104.02 111.3 124.9 104.85 11.72 20       0
## 8      H 79.76 99.37 108.24 116.2 132.2 107.59 13.82 20       0
## 4      D 83.68 99.89 111.77 118.9 132.8 108.44 14.68 20       0

It looks like group D had the largest sample mean, at 108.44, and group B had the smallest, at 95.13.

Fitting an ANOVA Model

Since we have a dataset with a categorical predictor (Group) and a quantitative response (Response), it makes sense to fit an ANOVA model to the data. Recall that this model has the form:

\[ Response_i = \mu_{Overall} + \alpha_{Group_i} + \epsilon_i \]

aovModel <- aov(Response ~ Group, data = FakeData)

Here are the estimates of the \(\alpha\) parameters

aovModel %>%
  model.tables(type = "effects")

## Tables of effects
## 
##  Group 
## Group
##      A      B      C      D      E      F      G      H      I      J 
## -3.431 -6.744 -0.193  6.561 -1.719  2.970 -1.121  5.708  0.661 -2.666 
##      K      L 
##  2.943 -2.968

We can construct the ANOVA table and perform an overall F-test to see how strong the evidence is that any of the group means differ:

summary(aovModel)

##              Df Sum Sq Mean Sq F value Pr(>F)
## Group        11   3419   310.8   1.407  0.171
## Residuals   228  50373   220.9

Pairwise Comparisons of Means

Suppose instead of fitting a model to all the data, we had first looked at the data and then decided to do a hypothesis test to see how strong the evidence was for a difference in population means between groups B and D, since these appeared to be the groups with the most promising evidence for a difference. What would happen?

FakeData %>%
  filter(Group %in% c("B","D")) %>%
  lm(Response ~ Group, data = .) %>%
  summary()

## 
## Call:
## lm(formula = Response ~ Group, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.969  -9.019   4.507  11.132  24.380 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   95.134      3.266   29.12   <2e-16 ***
## GroupD        13.305      4.620    2.88   0.0065 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.61 on 38 degrees of freedom
## Multiple R-squared:  0.1792, Adjusted R-squared:  0.1576 
## F-statistic: 8.295 on 1 and 38 DF,  p-value: 0.006499

We appear to have “strong evidence” that the means of the populations that gave us the data in groups B and D are different.

We can actually get \(P\)-values for all possible pairwise comparisons of pairs of group means as follows.

Since there are 12 groups, and each group can be compared to each other group, there are 12 * 11 = 132 ordered pairs of means; but since this counts each pair twice, there are actually 66 unique pairs.

PairwiseTests <- with(
  FakeData, 
  pairwise.t.test(
    x = Response, g = Group, 
    p.adjust.method = 'none'))

PairwiseTests

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  Response and Group 
## 
##   A      B      C      D      E      F      G      H      I      J     
## B 0.4817 -      -      -      -      -      -      -      -      -     
## C 0.4916 0.1648 -      -      -      -      -      -      -      -     
## D 0.0346 0.0051 0.1521 -      -      -      -      -      -      -     
## E 0.7160 0.2862 0.7458 0.0795 -      -      -      -      -      -     
## F 0.1746 0.0399 0.5017 0.4456 0.3196 -      -      -      -      -     
## G 0.6235 0.2328 0.8438 0.1036 0.8988 0.3851 -      -      -      -     
## H 0.0531 0.0086 0.2106 0.8561 0.1155 0.5608 0.1477 -      -      -     
## I 0.3849 0.1166 0.8560 0.2107 0.6131 0.6238 0.7050 0.2841 -      -     
## J 0.8708 0.3866 0.5993 0.0509 0.8405 0.2318 0.7426 0.0762 0.4798 -     
## K 0.1764 0.0405 0.5053 0.4423 0.3223 0.9955 0.3882 0.5570 0.6278 0.2340
## L 0.9216 0.4226 0.5555 0.0438 0.7907 0.2078 0.6947 0.0662 0.4409 0.9488
##   K     
## B -     
## C -     
## D -     
## E -     
## F -     
## G -     
## H -     
## I -     
## J -     
## K -     
## L 0.2098
## 
## P value adjustment method: none

Let’s make this a bit easier to read by rounding the P-values:

PairwiseTests %>% 
  pluck("p.value") %>%
  round(digits = 3)

##       A     B     C     D     E     F     G     H     I     J    K
## B 0.482    NA    NA    NA    NA    NA    NA    NA    NA    NA   NA
## C 0.492 0.165    NA    NA    NA    NA    NA    NA    NA    NA   NA
## D 0.035 0.005 0.152    NA    NA    NA    NA    NA    NA    NA   NA
## E 0.716 0.286 0.746 0.079    NA    NA    NA    NA    NA    NA   NA
## F 0.175 0.040 0.502 0.446 0.320    NA    NA    NA    NA    NA   NA
## G 0.623 0.233 0.844 0.104 0.899 0.385    NA    NA    NA    NA   NA
## H 0.053 0.009 0.211 0.856 0.115 0.561 0.148    NA    NA    NA   NA
## I 0.385 0.117 0.856 0.211 0.613 0.624 0.705 0.284    NA    NA   NA
## J 0.871 0.387 0.599 0.051 0.840 0.232 0.743 0.076 0.480    NA   NA
## K 0.176 0.040 0.505 0.442 0.322 0.995 0.388 0.557 0.628 0.234   NA
## L 0.922 0.423 0.556 0.044 0.791 0.208 0.695 0.066 0.441 0.949 0.21

Looking over the P-values for these pairwise tests, it looks like there are several “significant differences” at the 5% significance level between groups:

A and D B and D B and F B and H B and K D and L

with a several more pairs that have P-values just a bit above 0.05 (A and H, D and E, D and J, J and H, H and L).

So we have two results: an overall F test that says there isn’t good evidence that the population means are any different (which is a desirable result in this case, since in fact the process producing the data did not differ at all across groups), and a collection of t-tests, six of which yielded significant evidence of a difference between means.

Pairs:

How should we make sense of this discrepancy?

linearModel <- lm(Response ~ Group, data = FakeData)

summary(linearModel)

## 
## Call:
## lm(formula = Response ~ Group, data = FakeData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -35.910  -9.422   0.549  11.132  35.370 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  98.4464     3.3236  29.620   <2e-16 ***
## GroupB       -3.3126     4.7003  -0.705   0.4817    
## GroupC        3.2380     4.7003   0.689   0.4916    
## GroupD        9.9920     4.7003   2.126   0.0346 *  
## GroupE        1.7123     4.7003   0.364   0.7160    
## GroupF        6.4008     4.7003   1.362   0.1746    
## GroupG        2.3106     4.7003   0.492   0.6235    
## GroupH        9.1387     4.7003   1.944   0.0531 .  
## GroupI        4.0922     4.7003   0.871   0.3849    
## GroupJ        0.7651     4.7003   0.163   0.8708    
## GroupK        6.3741     4.7003   1.356   0.1764    
## GroupL        0.4632     4.7003   0.099   0.9216    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.86 on 228 degrees of freedom
## Multiple R-squared:  0.06356,    Adjusted R-squared:  0.01838 
## F-statistic: 1.407 on 11 and 228 DF,  p-value: 0.1707

Given that we know exactly the “population” model that generated the data, what can we say about the number of Type I and Type II errors (false discoveries and missed discoveries) in the results?
There are 12 * 11 / 2 = 66 sets of pairs of groups and therefore 66 different t-tests being performed here. If you had a dataset like this handed to you and did all pairwise tests, how many times do you expect you would reject \(H_0\) mistakenly?

Adding a genuinely different mean

Suppose we add a 13th group whose population mean really is different from the others: 110 instead of 100.

NewData <- tibble(
  Group    = rep("M", 20),
  Response = rnorm(20, mean = 110, sd = 15))
NewData

## # A tibble: 20 x 2
##    Group Response
##    <chr>    <dbl>
##  1 M        115. 
##  2 M         99.3
##  3 M        112. 
##  4 M        125. 
##  5 M        141. 
##  6 M         96.9
##  7 M         88.5
##  8 M         99.8
##  9 M        120. 
## 10 M        128. 
## 11 M         97.1
## 12 M        123. 
## 13 M        137. 
## 14 M        126. 
## 15 M        118. 
## 16 M        112. 
## 17 M        103. 
## 18 M        119. 
## 19 M         96.3
## 20 M        110.

Let’s join this dataset with the original one.

CombinedData <- FakeData %>%
  bind_rows(NewData)

Repeating the boxplot code:

gf_boxplot(Response ~ Group, data = CombinedData) +
  stat_summary(fun = mean, color = "darkorange")

Let’s refit the ANOVA model on this new dataset.

aovModel2 <- aov(Response ~ Group, data = CombinedData)

summary(aovModel2)

##              Df Sum Sq Mean Sq F value Pr(>F)  
## Group        12   5868   489.0   2.221 0.0114 *
## Residuals   247  54370   220.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now we have significant evidence of a difference between some population mean and some other population mean. And now, it’s true that some population mean is different!

So suppose we had this data did this test. What would we do next? How do we know which means differ from which other means?

If we repeat our collection of pairwise tests from before:

PairwisePValues <- with(
  CombinedData, 
  pairwise.t.test(
    x = Response, g = Group, 
    p.adjust.method = 'none')) %>%
  pluck("p.value") %>%
  round(digits = 3) 
PairwisePValues

##       A     B     C     D     E     F     G     H     I     J     K     L
## B 0.481    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## C 0.491 0.164    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## D 0.034 0.005 0.151    NA    NA    NA    NA    NA    NA    NA    NA    NA
## E 0.715 0.285 0.745 0.079    NA    NA    NA    NA    NA    NA    NA    NA
## F 0.174 0.039 0.501 0.445 0.319    NA    NA    NA    NA    NA    NA    NA
## G 0.623 0.232 0.843 0.103 0.899 0.384    NA    NA    NA    NA    NA    NA
## H 0.053 0.008 0.210 0.856 0.115 0.560 0.147    NA    NA    NA    NA    NA
## I 0.384 0.116 0.856 0.210 0.612 0.623 0.704 0.283    NA    NA    NA    NA
## J 0.871 0.386 0.599 0.050 0.840 0.231 0.742 0.076 0.479    NA    NA    NA
## K 0.176 0.040 0.504 0.441 0.321 0.995 0.387 0.556 0.627 0.233    NA    NA
## L 0.921 0.422 0.555 0.043 0.790 0.207 0.694 0.066 0.440 0.949 0.209    NA
## M 0.002 0.000 0.013 0.292 0.005 0.070 0.008 0.217 0.021 0.003 0.069 0.002

ifelse(PairwisePValues < 0.05, "*", "")

##   A   B   C   D   E   F  G   H  I   J   K  L  
## B ""  NA  NA  NA  NA  NA NA  NA NA  NA  NA NA 
## C ""  ""  NA  NA  NA  NA NA  NA NA  NA  NA NA 
## D "*" "*" ""  NA  NA  NA NA  NA NA  NA  NA NA 
## E ""  ""  ""  ""  NA  NA NA  NA NA  NA  NA NA 
## F ""  "*" ""  ""  ""  NA NA  NA NA  NA  NA NA 
## G ""  ""  ""  ""  ""  "" NA  NA NA  NA  NA NA 
## H ""  "*" ""  ""  ""  "" ""  NA NA  NA  NA NA 
## I ""  ""  ""  ""  ""  "" ""  "" NA  NA  NA NA 
## J ""  ""  ""  ""  ""  "" ""  "" ""  NA  NA NA 
## K ""  "*" ""  ""  ""  "" ""  "" ""  ""  NA NA 
## L ""  ""  ""  "*" ""  "" ""  "" ""  ""  "" NA 
## M "*" "*" "*" ""  "*" "" "*" "" "*" "*" "" "*"

The same six comparisons that were significant before still are. In addition we have significant evidence of a difference between the population mean for group M and the population means for groups A, B, C, E, G, I, J, and L (but not D, F, H, and K)

Pairs:

How many Type I and Type II Errors are there here?

Using adjusted P-values and Confidence Levels

If we employ a Bonferroni correction to the P-values and confidence levels…

library(DescTools)
PostHocTest(aovModel2, method = "bonferroni")

## 
##   Posthoc multiple comparisons of means : Bonferroni 
##     95% family-wise confidence level
## 
## $Group
##            diff     lwr.ci    upr.ci   pval    
## B-A -3.31262933 -19.535971 12.910712 1.0000    
## C-A  3.23800837 -12.985333 19.461350 1.0000    
## D-A  9.99196481  -6.231377 26.215307 1.0000    
## E-A  1.71225106 -14.511091 17.935593 1.0000    
## F-A  6.40081688  -9.822525 22.624159 1.0000    
## G-A  2.31058283 -13.912759 18.533925 1.0000    
## H-A  9.13872130  -7.084621 25.362063 1.0000    
## I-A  4.09215759 -12.131184 20.315499 1.0000    
## J-A  0.76510194 -15.458240 16.988444 1.0000    
## K-A  6.37414955  -9.849192 22.597491 1.0000    
## L-A  0.46318441 -15.760157 16.686526 1.0000    
## M-A 14.94851502  -1.274827 31.171857 0.1269    
## C-B  6.55063769  -9.672704 22.773980 1.0000    
## D-B 13.30459413  -2.918748 29.527936 0.3861    
## E-B  5.02488038 -11.198461 21.248222 1.0000    
## F-B  9.71344620  -6.509896 25.936788 1.0000    
## G-B  5.62321215 -10.600130 21.846554 1.0000    
## H-B 12.45135063  -3.771991 28.674692 0.6609    
## I-B  7.40478691  -8.818555 23.628129 1.0000    
## J-B  4.07773127 -12.145611 20.301073 1.0000    
## K-B  9.68677887  -6.536563 25.910121 1.0000    
## L-B  3.77581374 -12.447528 19.999156 1.0000    
## M-B 18.26114435   2.037803 34.484486 0.0100 ** 
## D-C  6.75395644  -9.469385 22.977298 1.0000    
## E-C -1.52575731 -17.749099 14.697585 1.0000    
## F-C  3.16280851 -13.060533 19.386150 1.0000    
## G-C -0.92742554 -17.150767 15.295916 1.0000    
## H-C  5.90071293 -10.322629 22.124055 1.0000    
## I-C  0.85414922 -15.369193 17.077491 1.0000    
## J-C -2.47290643 -18.696248 13.750435 1.0000    
## K-C  3.13614118 -13.087201 19.359483 1.0000    
## L-C -2.77482395 -18.998166 13.448518 1.0000    
## M-C 11.71050665  -4.512835 27.933848 1.0000    
## E-D -8.27971375 -24.503056  7.943628 1.0000    
## F-D -3.59114793 -19.814490 12.632194 1.0000    
## G-D -7.68138198 -23.904724  8.541960 1.0000    
## H-D -0.85324351 -17.076585 15.370098 1.0000    
## I-D -5.89980722 -22.123149 10.323535 1.0000    
## J-D -9.22686287 -25.450205  6.996479 1.0000    
## K-D -3.61781526 -19.841157 12.605527 1.0000    
## L-D -9.52878039 -25.752122  6.694561 1.0000    
## M-D  4.95655021 -11.266792 21.179892 1.0000    
## F-E  4.68856582 -11.534776 20.911908 1.0000    
## G-E  0.59833177 -15.625010 16.821674 1.0000    
## H-E  7.42647024  -8.796872 23.649812 1.0000    
## I-E  2.37990653 -13.843435 18.603248 1.0000    
## J-E -0.94714912 -17.170491 15.276193 1.0000    
## K-E  4.66189849 -11.561443 20.885240 1.0000    
## L-E -1.24906664 -17.472408 14.974275 1.0000    
## M-E 13.23626396  -2.987078 29.459606 0.4035    
## G-F -4.09023405 -20.313576 12.133108 1.0000    
## H-F  2.73790442 -13.485437 18.961246 1.0000    
## I-F -2.30865929 -18.532001 13.914683 1.0000    
## J-F -5.63571494 -21.859057 10.587627 1.0000    
## K-F -0.02666733 -16.250009 16.196674 1.0000    
## L-F -5.93763246 -22.160974 10.285709 1.0000    
## M-F  8.54769814  -7.675644 24.771040 1.0000    
## H-G  6.82813847  -9.395203 23.051480 1.0000    
## I-G  1.78157476 -14.441767 18.004917 1.0000    
## J-G -1.54548089 -17.768823 14.677861 1.0000    
## K-G  4.06356672 -12.159775 20.286909 1.0000    
## L-G -1.84739841 -18.070740 14.375943 1.0000    
## M-G 12.63793219  -3.585410 28.861274 0.5890    
## I-H -5.04656371 -21.269906 11.176778 1.0000    
## J-H -8.37361936 -24.596961  7.849722 1.0000    
## K-H -2.76457175 -18.987914 13.458770 1.0000    
## L-H -8.67553688 -24.898879  7.547805 1.0000    
## M-H  5.80979372 -10.413548 22.033136 1.0000    
## J-I -3.32705565 -19.550397 12.896286 1.0000    
## K-I  2.28199196 -13.941350 18.505334 1.0000    
## L-I -3.62897317 -19.852315 12.594369 1.0000    
## M-I 10.85635743  -5.366984 27.079699 1.0000    
## K-J  5.60904761 -10.614294 21.832389 1.0000    
## L-J -0.30191753 -16.525259 15.921424 1.0000    
## M-J 14.18341308  -2.039929 30.406755 0.2157    
## L-K -5.91096513 -22.134307 10.312377 1.0000    
## M-K  8.57436547  -7.648976 24.797707 1.0000    
## M-L 14.48533060  -1.738011 30.708672 0.1755    
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

none of the spurious differences come out significant, but only one of the 12 real differences does.

Using Fisher’s LSD, we get the same results we got before by just doing the t-tests, since our only “protection” was the overall F-test, which was significant.

PostHocTest(aovModel2, method = "lsd")

## 
##   Posthoc multiple comparisons of means : Fisher LSD 
##     95% family-wise confidence level
## 
## $Group
##            diff      lwr.ci      upr.ci    pval    
## B-A -3.31262933 -12.5535230  5.92826430 0.48082    
## C-A  3.23800837  -6.0028853 12.47890200 0.49075    
## D-A  9.99196481   0.7510712 19.23285843 0.03419 *  
## E-A  1.71225106  -7.5286426 10.95314469 0.71546    
## F-A  6.40081688  -2.8400768 15.64171051 0.17372    
## G-A  2.31058283  -6.9303108 11.55147646 0.62282    
## H-A  9.13872130  -0.1021723 18.37961493 0.05257 .  
## I-A  4.09215759  -5.1487360 13.33305122 0.38394    
## J-A  0.76510194  -8.4757917 10.00599557 0.87059    
## K-A  6.37414955  -2.8667441 15.61504318 0.17551    
## L-A  0.46318441  -8.7777092  9.70407804 0.92144    
## M-A 14.94851502   5.7076214 24.18940865 0.00163 ** 
## C-B  6.55063769  -2.6902559 15.79153132 0.16390    
## D-B 13.30459413   4.0637005 22.54548776 0.00495 ** 
## E-B  5.02488038  -4.2160132 14.26577401 0.28521    
## F-B  9.71344620   0.4725526 18.95433983 0.03946 *  
## G-B  5.62321215  -3.6176815 14.86410578 0.23186    
## H-B 12.45135063   3.2104570 21.69224426 0.00847 ** 
## I-B  7.40478691  -1.8361067 16.64568054 0.11578    
## J-B  4.07773127  -5.1631624 13.31862490 0.38562    
## K-B  9.68677887   0.4458852 18.92767250 0.04000 *  
## L-B  3.77581374  -5.4650799 13.01670737 0.42172    
## M-B 18.26114435   9.0202507 27.50203798 0.00013 ***
## D-C  6.75395644  -2.4869372 15.99485007 0.15126    
## E-C -1.52575731 -10.7666509  7.71513632 0.74530    
## F-C  3.16280851  -6.0780851 12.40370214 0.50086    
## G-C -0.92742554 -10.1683192  8.31346809 0.84346    
## H-C  5.90071293  -3.3401807 15.14160656 0.20969    
## I-C  0.85414922  -8.3867444 10.09504285 0.85569    
## J-C -2.47290643 -11.7138001  6.76798720 0.59861    
## K-C  3.13614118  -6.1047524 12.37703481 0.50448    
## L-C -2.77482395 -12.0157176  6.46606968 0.55477    
## M-C 11.71050665   2.4696130 20.95140028 0.01321 *  
## E-D -8.27971375 -17.5206074  0.96117988 0.07884 .  
## F-D -3.59114793 -12.8320416  5.64974570 0.44475    
## G-D -7.68138198 -16.9222756  1.55951165 0.10286    
## H-D -0.85324351 -10.0941371  8.38765012 0.85584    
## I-D -5.89980722 -15.1407008  3.34108641 0.20976    
## J-D -9.22686287 -18.4677565  0.01403076 0.05035 .  
## K-D -3.61781526 -12.8587089  5.62307837 0.44138    
## L-D -9.52878039 -18.7696740 -0.28788676 0.04333 *  
## M-D  4.95655021  -4.2843434 14.19744384 0.29180    
## F-E  4.68856582  -4.5523278 13.92945945 0.31861    
## G-E  0.59833177  -8.6425619  9.83922540 0.89863    
## H-E  7.42647024  -1.8144234 16.66736387 0.11473    
## I-E  2.37990653  -6.8609871 11.62080016 0.61243    
## J-E -0.94714912 -10.1880427  8.29374451 0.84018    
## K-E  4.66189849  -4.5789951 13.90279212 0.32137    
## L-E -1.24906664 -10.4899603  7.99182699 0.79029    
## M-E 13.23626396   3.9953703 22.47715759 0.00517 ** 
## G-F -4.09023405 -13.3311277  5.15065958 0.38417    
## H-F  2.73790442  -6.5029892 11.97879805 0.56005    
## I-F -2.30865929 -11.5495529  6.93223434 0.62311    
## J-F -5.63571494 -14.8766086  3.60517869 0.23082    
## K-F -0.02666733  -9.2675610  9.21422630 0.99547    
## L-F -5.93763246 -15.1785261  3.30326117 0.20687    
## M-F  8.54769814  -0.6931955 17.78859177 0.06968 .  
## H-G  6.82813847  -2.4127552 16.06903210 0.14684    
## I-G  1.78157476  -7.4593189 11.02246839 0.70447    
## J-G -1.54548089 -10.7863745  7.69541274 0.74213    
## K-G  4.06356672  -5.1773269 13.30446035 0.38727    
## L-G -1.84739841 -11.0882920  7.39349522 0.69410    
## M-G 12.63793219   3.3970386 21.87882582 0.00755 ** 
## I-H -5.04656371 -14.2874573  4.19432992 0.28314    
## J-H -8.37361936 -17.6145130  0.86727427 0.07553 .  
## K-H -2.76457175 -12.0054654  6.47632188 0.55624    
## L-H -8.67553688 -17.9164305  0.56535675 0.06564 .  
## M-H  5.80979372  -3.4310999 15.05068735 0.21678    
## J-I -3.32705565 -12.5679493  5.91383798 0.47891    
## K-I  2.28199196  -6.9589017 11.52288559 0.62712    
## L-I -3.62897317 -12.8698668  5.61192046 0.43998    
## M-I 10.85635743   1.6154638 20.09725106 0.02149 *  
## K-J  5.60904761  -3.6318460 14.84994124 0.23303    
## L-J -0.30191753  -9.5428112  8.93897610 0.94874    
## M-J 14.18341308   4.9425194 23.42430671 0.00277 ** 
## L-K -5.91096513 -15.1518588  3.32992850 0.20891    
## M-K  8.57436547  -0.6665282 17.81525910 0.06882 .  
## M-L 14.48533060   5.2444370 23.72622423 0.00225 ** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Finally, using Tukey’s HSD:

PostHocTest(aovModel2, method = "hsd")

## 
##   Posthoc multiple comparisons of means : Tukey HSD 
##     95% family-wise confidence level
## 
## $Group
##            diff      lwr.ci    upr.ci   pval    
## B-A -3.31262933 -19.0093111 12.384052 1.0000    
## C-A  3.23800837 -12.4586734 18.934690 1.0000    
## D-A  9.99196481  -5.7047170 25.688647 0.6429    
## E-A  1.71225106 -13.9844307 17.408933 1.0000    
## F-A  6.40081688  -9.2958649 22.097499 0.9785    
## G-A  2.31058283 -13.3860989 18.007265 1.0000    
## H-A  9.13872130  -6.5579605 24.835403 0.7652    
## I-A  4.09215759 -11.6045242 19.788839 0.9997    
## J-A  0.76510194 -14.9315798 16.461784 1.0000    
## K-A  6.37414955  -9.3225322 22.070831 0.9792    
## L-A  0.46318441 -15.2334974 16.159866 1.0000    
## M-A 14.94851502  -0.7481668 30.645197 0.0793 .  
## C-B  6.55063769  -9.1460441 22.247319 0.9741    
## D-B 13.30459413  -2.3920876 29.001276 0.1928    
## E-B  5.02488038 -10.6718014 20.721562 0.9974    
## F-B  9.71344620  -5.9832356 25.410128 0.6846    
## G-B  5.62321215 -10.0734696 21.319894 0.9928    
## H-B 12.45135063  -3.2453311 28.148032 0.2845    
## I-B  7.40478691  -8.2918949 23.101469 0.9350    
## J-B  4.07773127 -11.6189505 19.774413 0.9997    
## K-B  9.68677887  -6.0099029 25.383461 0.6885    
## L-B  3.77581374 -11.9208680 19.472496 0.9999    
## M-B 18.26114435   2.5644626 33.957826 0.0081 ** 
## D-C  6.75395644  -8.9427253 22.450638 0.9670    
## E-C -1.52575731 -17.2224391 14.170924 1.0000    
## F-C  3.16280851 -12.5338733 18.859490 1.0000    
## G-C -0.92742554 -16.6241073 14.769256 1.0000    
## H-C  5.90071293  -9.7959688 21.597395 0.9891    
## I-C  0.85414922 -14.8425326 16.550831 1.0000    
## J-C -2.47290643 -18.1695882 13.223775 1.0000    
## K-C  3.13614118 -12.5605406 18.832823 1.0000    
## L-C -2.77482395 -18.4715057 12.921858 1.0000    
## M-C 11.71050665  -3.9861751 27.407188 0.3819    
## E-D -8.27971375 -23.9763955  7.416968 0.8650    
## F-D -3.59114793 -19.2878297 12.105534 0.9999    
## G-D -7.68138198 -23.3780638  8.015300 0.9164    
## H-D -0.85324351 -16.5499253 14.843438 1.0000    
## I-D -5.89980722 -21.5964890  9.796875 0.9891    
## J-D -9.22686287 -24.9235446  6.469819 0.7535    
## K-D -3.61781526 -19.3144970 12.078867 0.9999    
## L-D -9.52878039 -25.2254622  6.167901 0.7115    
## M-D  4.95655021 -10.7401316 20.653232 0.9978    
## F-E  4.68856582 -11.0081160 20.385248 0.9987    
## G-E  0.59833177 -15.0983500 16.295014 1.0000    
## H-E  7.42647024  -8.2702115 23.123152 0.9337    
## I-E  2.37990653 -13.3167752 18.076588 1.0000    
## J-E -0.94714912 -16.6438309 14.749533 1.0000    
## K-E  4.66189849 -11.0347833 20.358580 0.9988    
## L-E -1.24906664 -16.9457484 14.447615 1.0000    
## M-E 13.23626396  -2.4604178 28.932946 0.1993    
## G-F -4.09023405 -19.7869158 11.606448 0.9997    
## H-F  2.73790442 -12.9587774 18.434586 1.0000    
## I-F -2.30865929 -18.0053411 13.388022 1.0000    
## J-F -5.63571494 -21.3323967 10.060967 0.9927    
## K-F -0.02666733 -15.7233491 15.670014 1.0000    
## L-F -5.93763246 -21.6343142  9.759049 0.9885    
## M-F  8.54769814  -7.1489836 24.244380 0.8370    
## H-G  6.82813847  -8.8685433 22.524820 0.9642    
## I-G  1.78157476 -13.9151070 17.478257 1.0000    
## J-G -1.54548089 -17.2421627 14.151201 1.0000    
## K-G  4.06356672 -11.6331151 19.760248 0.9997    
## L-G -1.84739841 -17.5440802 13.849283 1.0000    
## M-G 12.63793219  -3.0587496 28.334614 0.2625    
## I-H -5.04656371 -20.7432455 10.650118 0.9973    
## J-H -8.37361936 -24.0703011  7.323062 0.8555    
## K-H -2.76457175 -18.4612535 12.932110 1.0000    
## L-H -8.67553688 -24.3722187  7.021145 0.8226    
## M-H  5.80979372  -9.8868881 21.506475 0.9905    
## J-I -3.32705565 -19.0237374 12.369626 1.0000    
## K-I  2.28199196 -13.4146898 17.978674 1.0000    
## L-I -3.62897317 -19.3256549 12.067709 0.9999    
## M-I 10.85635743  -4.8403243 26.553039 0.5091    
## K-J  5.60904761 -10.0876342 21.305729 0.9930    
## L-J -0.30191753 -15.9985993 15.394764 1.0000    
## M-J 14.18341308  -1.5132687 29.880095 0.1226    
## L-K -5.91096513 -21.6076469  9.785717 0.9889    
## M-K  8.57436547  -7.1223163 24.271047 0.8340    
## M-L 14.48533060  -1.2113512 30.182012 0.1037    
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

we get the same results in this case as Bonferroni (although the P-vales are smaller in general).