Chapter 18 Chi-squared test

18.1 Multinomial Goodness of Fit

A population is called multinomial if its data is categorical and belongs to a collection of discrete non-overlapping classes.

The null hypothesis for goodness of fit test for multinomial distribution is that the observed frequency fi is equal to an expected count \[e_i\] in each category. It is to be rejected if the p-value of the following Chi-squared test statistics is less than a given significance level α.

Example Survey response about the student’s smoking habit: “Heavy,” “Regul” (regularly), “Occas” (occasionally) and “Never.” The Smoke data is multinomial.

library(MASS)
levels(survey$Smoke)

## [1] "Heavy" "Never" "Occas" "Regul"

smoke_freq = table(survey$Smoke) 
smoke_freq

## 
## Heavy Never Occas Regul 
##    11   189    19    17

# estimated probabilities
smoke_prob = c(heavy = .045, 
               never = .795, 
               occas = .085, 
               regul = .075)

Determine whether the sample data in smoke_freq supports estimated probabilities in smoke_prob at .05 significance level.

chisq.test(smoke_freq, p=smoke_prob)

## 
##  Chi-squared test for given probabilities
## 
## data:  smoke_freq
## X-squared = 0.10744, df = 3, p-value = 0.9909

As the p-value 0.991 is greater than the .05 significance level, we do not reject the null hypothesis that the sample data in survey supports the smoking statistics.

Sources

Multinomial Goodness of Fit