Chapter 18 Chi-squared test
18.1 Multinomial Goodness of Fit
A population is called multinomial if its data is categorical and belongs to a collection of discrete non-overlapping classes.
The null hypothesis for goodness of fit test for multinomial distribution is that the observed frequency fi is equal to an expected count \[e_i\] in each category. It is to be rejected if the p-value of the following Chi-squared test statistics is less than a given significance level α.
Example Survey response about the student’s smoking habit: “Heavy,” “Regul” (regularly), “Occas” (occasionally) and “Never.” The Smoke data is multinomial.
library(MASS)
levels(survey$Smoke)
## [1] "Heavy" "Never" "Occas" "Regul"
= table(survey$Smoke)
smoke_freq smoke_freq
##
## Heavy Never Occas Regul
## 11 189 19 17
# estimated probabilities
= c(heavy = .045,
smoke_prob never = .795,
occas = .085,
regul = .075)
Determine whether the sample data in smoke_freq supports estimated probabilities in smoke_prob at .05 significance level.
chisq.test(smoke_freq, p=smoke_prob)
##
## Chi-squared test for given probabilities
##
## data: smoke_freq
## X-squared = 0.10744, df = 3, p-value = 0.9909
As the p-value 0.991 is greater than the .05 significance level, we do not reject the null hypothesis that the sample data in survey supports the smoking statistics.
Sources