top of page

7. ANOVA in R with post-hoc pairwise t-tests using egg width dataset


Analysis of Variance (ANOVA)

This test is similar to a t-test in that a statistical comparison of the values of a continuous variable is made between groups (samples) of a categorical variable.

Unlike a t-test that allows you to compare between two samples only, ANOVA allows you to compare between more than two groups. The data need to arranged in long-format for R (please Google "long versus wide data formatting" if you are confused by this).

Once you have imported the data into R (see refer to the previous post) you can conduct the ANOVA using the following command:

eggANOVA<-aov(Width~Project, data = Eggwidth)

The above line of code is telling R to: create a new object "eggANOVA" - which will be results of the ANOVA. The code is asking R to see whether the dependent variable values in column "Width" are "explained by" the values in the independent variable column "Project". The code is telling R that these columns are within the dataframe "Eggwidth").

In executing this command you are asking R to see whether differences in egg widths are statistically associated with differences in kiwi projects. To see the results of the test enter the following code:

summary(eggANOVA)

This will generate the following output:

Df Sum Sq Mean Sq F value Pr(>F)

Project 4 3139 784.8 54.24 <2e-16 ***

Residuals 45 651 14.5

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Your P value is much less than 0.01! That means there is a significant difference – but which of the groups differs significantly from the other/s?

You will now need to perform a series of pairwise tests to work out which of these group means is higher or lower than the others. You will also need to plot the means (and a measure of the data “spread” around each of these means) in a graph so you can see which is higher/lower, or just calculate the means and look at the values (please see posts below for help with this).

The easiest way to conduct pairwise tests of group means in R is to use the “stats” package function “pairwise.t.test”. You may recall from doing statistics at high-school or uni that when you run multiple tests over and over on the same data you impact the probability of the test revealing a valid result.

For example, every time you do a test on related data, you effect the probability of the result of the next test. Therefore, when you make multiple comparisons on related data you have to adjust the probability level (the alpha) from one test to the next.

In the “pairwise.t.test” function, you can ask R to do this automatically for you. There are several probability adjustments you can use. I like using Holm at the moment (for no good reason), so let’s go with that!

Let’s say you have run your ANOVA as above, and need to see which kiwi project has the chicks with the widest eggs. I am going to call the “object” I create (which will be the pairwise t-test) “Eggpairwise”. Use the following code to run the pairwise tests.

Eggpairwise<-pairwise.t.test(Eggwidth$Width, Eggwidth$Project, p.adjust.method="holm")

To see the results just type in:

Eggpairwise

It show the following results:

> Eggpairwise

Pairwise comparisons using t tests with pooled SD

data: Eggwidth$Width and Eggwidth$Project

A B C D

B 4.6e-07 - - -

C 0.00276 8.3e-12 - -

D 0.05866 0.00021 1.4e-05 -

E 2.7e-08 < 2e-16 0.00101 4.8e-11

P value adjustment method: holm

These results show that only Kiwi Project’s B and C did not differ significantly from one another. All other comparisons have a P< 0.05. Cool!

In a report of paper, you would present these results in a little table after your figure showing the means and their SD or SE.

You would report the results in the text as follows: “ANOVA showed egg width significantly differed between Kiwi Projects (F(4, 45)= 54.24, P<0.001), with Holm corrected multiple pair-wise comparisons showing egg width was statistically equivalent between Projects B and D did only, all other Projects differed significantly (Table 1).

(F statistic df came from the R ANOVA output in the above table.)

bottom of page