Chapter 12 Tests for categorical variables
Categorical variable can take fixed number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property.
12.1 Chi-squared tests
The chi-squared test is most suited to large datasets. As a general rule, the chi-squared test is appropriate if at least 80% of the cells have an expected frequency of 5 or greater. In addition, none of the cells should have an expected frequency less than 1. If the expected values are very small, categories may be combined (if it makes sense to do so) to create fewer larger categories. Alternatively, Fisher’s exact test can be used.
= rbind(c(83,35), c(92,43))
data data
## [,1] [,2]
## [1,] 83 35
## [2,] 92 43
chisq.test(data, correct=F)
##
## Pearson's Chi-squared test
##
## data: data
## X-squared = 0.14172, df = 1, p-value = 0.7066
chisq.test(testor,correct=F) ## Fisher’s Exact test R Example:
Group | TumourShrinkage-No | TumourShrinkage-Yes | Total |
---|---|---|---|
1 Treatment | 8 | 3 | 11 |
2 Placebo | 9 | 4 | 13 |
3 Total | 17 | 7 | 24 |
The null hypothesis is that there is no association between treatment and tumour shrinkage.
The alternative hypothesis is that there is some association between treatment group and tumour shrinkage.
= rbind(c(8,3), c(9,4))
data data
## [,1] [,2]
## [1,] 8 3
## [2,] 9 4
fisher.test(data)
##
## Fisher's Exact Test for Count Data
##
## data: data
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.1456912 10.6433317
## sample estimates:
## odds ratio
## 1.176844
The output Fisher’s exact test tells us that the probability of observing such an extreme combination of frequencies is high, our p-value is 1.000 which is clearly greater than 0.05. In this case, there is no evidence of an association between treatment group and tumour shrinkage.