12. Summarise a large data set in R into mean, SD, and SE using R package "plyr"
- Lindsey Gray
- Aug 14, 2018
- 2 min read

Summarsing a large dataset
You can use the R function “ddply” from package “plyr” to summarise a large data set that has multiple factors and levels of variables. This is especially useful if you are intending to go on and use ggplot to make figures from your data (instructions for ggplot coming soonish).
You start by calling your dataframe into R.
Then load “plyr”.
If you had a data frame (called "no control" in this example) with the following variables (column headings in bold)
gen regime weight tag1
4 1to5 1.326300 0.6780195
4 1to5 1.050000 0.7563832
4 1to5 1.381700 0.6497953
8 1to9 1.2745 0.540150407
8 1to9 1.385 0.602213122
8 1to9 1.3559 0.475923168
8 1to9 1.4296 0.328257718
8 1to9 1.0098 0.471577574
8 1to9 1.2138 0.817707813
8 1to9 0.9837 0.560899818
8 1to9 1.3693 0.339046903
8 1to9 1.471 0.469454579
8 1to9 1.4916 0.350061025
8 1to9 1.2114 0.642312816
8 1to9 1.2897 0.47414425
8 1to9 1.4016 0.842972917
8 6to1 1.379 0.137012317
8 6to1 1.3342 0.425495298
8 6to1 1.1775 0.288524176
16 1to5 1.0472 0.30005540
16 1to5 0.9083 0.49911470
16 1to5 1.3334 0.53220660
16 1to5 1.0093 0.41224520…(which continues for hundreds of rows of cases)…
you would use the following code to make a new dataframe “summaryfat”:
summaryfat<-ddply(nocontrol, c("regime", "gen"), summarise, N=length(tag1), mean=mean(tag1), sd=sd(tag1), se=sd/sqrt(N))
You then type in “summaryfat” and it will show the valves for each statistic requested in the above code:
summaryfat
regime gen N mean sd se
1 1to5 1 15 0.4001375 0.1898660 0.04902318
2 1to5 4 15 0.4866688 0.1962543 0.05067264
3 1to5 8 15 0.4550373 0.1515650 0.03913391
4 1to5 16 15 0.4633875 0.1063558 0.02746095
5 1to9 1 19 0.6963143 0.2435680 0.05587833
6 1to9 4 19 0.6310593 0.2117431 0.04857721
7 1to9 8 19 0.6296398 0.2520157 0.05781637
8 1to9 16 19 0.5503357 0.1521410 0.03490353
9 6to1 1 23 0.2712649 0.1092061 0.02277104
10 6to1 4 23 0.2508258 0.1465883 0.03056578
11 6to1 8 23 0.3519646 0.1379969 0.02877434
12 6to1 16 23 0.4196955 0.2537789 0.05291656
We will call upon this new dataframe when making figures in ggplot.
Comments