top of page

12. Summarise a large data set in R into mean, SD, and SE using R package "plyr"

  • Lindsey Gray
  • Aug 14, 2018
  • 2 min read

Guinea pigs

Summarsing a large dataset

You can use the R function “ddply” from package “plyr” to summarise a large data set that has multiple factors and levels of variables. This is especially useful if you are intending to go on and use ggplot to make figures from your data (instructions for ggplot coming soonish).

You start by calling your dataframe into R.

Then load “plyr”.

If you had a data frame (called "no control" in this example) with the following variables (column headings in bold)

gen regime weight tag1

4 1to5 1.326300 0.6780195

4 1to5 1.050000 0.7563832

4 1to5 1.381700 0.6497953

8 1to9 1.2745 0.540150407

8 1to9 1.385 0.602213122

8 1to9 1.3559 0.475923168

8 1to9 1.4296 0.328257718

8 1to9 1.0098 0.471577574

8 1to9 1.2138 0.817707813

8 1to9 0.9837 0.560899818

8 1to9 1.3693 0.339046903

8 1to9 1.471 0.469454579

8 1to9 1.4916 0.350061025

8 1to9 1.2114 0.642312816

8 1to9 1.2897 0.47414425

8 1to9 1.4016 0.842972917

8 6to1 1.379 0.137012317

8 6to1 1.3342 0.425495298

8 6to1 1.1775 0.288524176

16 1to5 1.0472 0.30005540

16 1to5 0.9083 0.49911470

16 1to5 1.3334 0.53220660

16 1to5 1.0093 0.41224520…(which continues for hundreds of rows of cases)…

you would use the following code to make a new dataframe “summaryfat”:

summaryfat<-ddply(nocontrol, c("regime", "gen"), summarise, N=length(tag1), mean=mean(tag1), sd=sd(tag1), se=sd/sqrt(N))

You then type in “summaryfat” and it will show the valves for each statistic requested in the above code:

summaryfat

regime gen N mean sd se

1 1to5 1 15 0.4001375 0.1898660 0.04902318

2 1to5 4 15 0.4866688 0.1962543 0.05067264

3 1to5 8 15 0.4550373 0.1515650 0.03913391

4 1to5 16 15 0.4633875 0.1063558 0.02746095

5 1to9 1 19 0.6963143 0.2435680 0.05587833

6 1to9 4 19 0.6310593 0.2117431 0.04857721

7 1to9 8 19 0.6296398 0.2520157 0.05781637

8 1to9 16 19 0.5503357 0.1521410 0.03490353

9 6to1 1 23 0.2712649 0.1092061 0.02277104

10 6to1 4 23 0.2508258 0.1465883 0.03056578

11 6to1 8 23 0.3519646 0.1379969 0.02877434

12 6to1 16 23 0.4196955 0.2537789 0.05291656

We will call upon this new dataframe when making figures in ggplot.


 
 
 

Comments


  • twitter
  • linkedin

Sydney, New South Wales, Australia

©2017 by Lindsey Gray. Proudly created with Wix.com

bottom of page