10. Chi-square analysis in R using "below25%line" dataset

Lindsey Gray
Aug 17, 2018
2 min read

Chi-square

Chi-square is the test I use the least. Even though it is one of the more simple tests, I always get a little confused about it.

It is used to test for statistical relationships between mixed variables that don’t neatly fall into any of the variable groupings we have discussed above.

You use it to test for significant relationship between two categorical variables, or for when your response variable that is count data or a percentage/proportion of something that isn’t really continuous and can’t be expressed as a mean ± the mean’s SE or SD (see below for more on these).

Let’s say you are interested in seeing whether there is a significant relationship between going under the "25% weight loss line" in chicks and chick “early death”.

Here you might have a dataset comprised of two columns “under25” and “earlyd” (see “Below25%line” Excel file). Here “under25” is a categorical variable with only two options or levels “y” or “no”, and “earlyd” is a categorical variable again with only two options or levels “1” or “0” (you could have coded these values as “y” and “n” too if your wanted too, or “A” and “B” – whatever you like!). You can call this into R and call the resulting dataframe “below25” using this command:

below25<-read.table(file.choose(), header=TRUE)

You now need to make sure R will treat both these variables as factors (remember R calls categorical variables “factors”) by entering:

below25$under25<-as.factor(below25$under25)

and then:

below25$earlyd<-as.factor(below25$earlyd)

To conduct the chi-square test use the following command:

chitest<-chisq.test(below25$under25, below25$earlyd)

If you type in the name you gave to your Chi-squared test object “chitest”, R will show you the results:

Pearson's Chi-squared test with Yates' continuity

correction

data: below25$under25 and below25$earlyd

X-squared = 1.4815, df = 1, p-value = 0.2235

This indicates that according to the fake data made, there is no significant relationship between going under the 25% line and dying early.

You would report these results as follows, “Chi-squared test showed there was no significant relationship between a chick’s weight going below the 25% weight loss line and early death (χ2 = 1.48, df = 1, P= 0.2).

Another example could be testing for a relationship between Kiwi Project and the percentage of chicks with early death in a given season.

Here project would be a categorical independent variable, and early death would act as a *sort of count variable* a percentage dependent variable – you just have one percentage per project (no mean and variance can be calculated).

You would calculate what percentage out of the total number of chicks from each project died early. As in the above example you would create a dataset with two columns, “project” and “percentage” and conduct the Chi-square test as above.

EASY-R

Step-by-step instructions on how to use the free statistics program R for absolute beginners, by Biologists Lindsey Gray and Brittany Mitchell.

Download R: https://cran.r-project.org

1. Quick statistics re-hash before going into R-specific stuff

2. R specific words and definitions you need to learn

3. Downloading R and R packages

10. Chi-square analysis in R using "below25%line" dataset

Comments