Chapter 16 Analysis of Variance (ANOVA)
16.1 One-way ANOVA
variance = SS/df, where SS - sum of squares and df - degree of freedom
\(SS = \displaystyle\sum_{i=1}^{n}{(x_i - \mu)^2}\), where
\(\mu\) is the sample mean
n is the sample size
\(var(x) = \frac{1}{n}{\displaystyle\sum_{i=1}^{n}{(x_i - \mu)^2}}\)
SST = SSE + SSC = W + B, where
SST - Total Sum of Squares
SSE - Error Sum of Squares - within (W)
SSC - Sum of Squares Columns (treatmens) - between (B)
C - columns (treatments)
N - total number of observations
Mean squared of columns - MSC = SSC/df_columns, where df_columns = C-1
Mean squared of error - MSE = SSE/df_error, where df_error = N-C
Sum of squares (total) - SST, where df_total = N-1
F-statistics - F = MSC/MSE
Let’s calculate degree of freedom for our example:
df_columns = 3-1 = 2, MSC = SSC/2
df_error = 21-3 = 18, MSE = SSE/18
df_total = 21-1 = 20
# 3 groups of students with scores (1-100):
= c(82,93,61,74,69,70,53)
a = c(71,62,85,94,78,66,71)
b = c(64,73,87,91,56,78,87)
c
= function(x) { sum((x - mean(x))^2) }
sq
sq(a)
## [1] 1039.429
sq(b)
## [1] 751.4286
sq(c)
## [1] 1021.714
Using R packages:
# data
# Number of calories consumed by month:
<- c(2166, 1568, 2233, 1882, 2019)
may <- c(2279, 2075, 2131, 2009, 1793)
sep <- c(2226, 2154, 2583, 2010, 2190)
dec
<- stack(list(may=may, sep=sep, dec=dec))
d d
## values ind
## 1 2166 may
## 2 1568 may
## 3 2233 may
## 4 1882 may
## 5 2019 may
## 6 2279 sep
## 7 2075 sep
## 8 2131 sep
## 9 2009 sep
## 10 1793 sep
## 11 2226 dec
## 12 2154 dec
## 13 2583 dec
## 14 2010 dec
## 15 2190 dec
names(d)
## [1] "values" "ind"
oneway.test(values ~ ind, data=d, var.equal=TRUE)
##
## One-way analysis of means
##
## data: values and ind
## F = 1.7862, num df = 2, denom df = 12, p-value = 0.2094
# alternative using aov
<- aov(values ~ ind, data = d)
res res
## Call:
## aov(formula = values ~ ind, data = d)
##
## Terms:
## ind Residuals
## Sum of Squares 174664.1 586719.6
## Deg. of Freedom 2 12
##
## Residual standard error: 221.1183
## Estimated effects may be unbalanced
summary(res)
## Df Sum Sq Mean Sq F value Pr(>F)
## ind 2 174664 87332 1.786 0.209
## Residuals 12 586720 48893
16.2 Sources
Example for one-way ANOVA: youtube by Brandon Foltz