Chapter 16 Analysis of Variance (ANOVA)

16.1 One-way ANOVA

variance = SS/df, where SS - sum of squares and df - degree of freedom
\(SS = \displaystyle\sum_{i=1}^{n}{(x_i - \mu)^2}\), where
\(\mu\) is the sample mean
n is the sample size

\(var(x) = \frac{1}{n}{\displaystyle\sum_{i=1}^{n}{(x_i - \mu)^2}}\)

SST = SSE + SSC = W + B, where
SST - Total Sum of Squares
SSE - Error Sum of Squares - within (W)
SSC - Sum of Squares Columns (treatmens) - between (B)

C - columns (treatments)
N - total number of observations

Mean squared of columns - MSC = SSC/df_columns, where df_columns = C-1
Mean squared of error - MSE = SSE/df_error, where df_error = N-C
Sum of squares (total) - SST, where df_total = N-1 F-statistics - F = MSC/MSE

Let’s calculate degree of freedom for our example:
df_columns = 3-1 = 2, MSC = SSC/2
df_error = 21-3 = 18, MSE = SSE/18
df_total = 21-1 = 20

# 3 groups of students with scores (1-100):
a = c(82,93,61,74,69,70,53)
b = c(71,62,85,94,78,66,71)
c = c(64,73,87,91,56,78,87)

sq = function(x) { sum((x - mean(x))^2) }

sq(a)
## [1] 1039.429
sq(b)
## [1] 751.4286
sq(c)
## [1] 1021.714

Using R packages:

# data
# Number of calories consumed by month:
may <- c(2166, 1568, 2233, 1882, 2019)
sep <- c(2279, 2075, 2131, 2009, 1793)
dec <- c(2226, 2154, 2583, 2010, 2190)

d <- stack(list(may=may, sep=sep, dec=dec))
d
##    values ind
## 1    2166 may
## 2    1568 may
## 3    2233 may
## 4    1882 may
## 5    2019 may
## 6    2279 sep
## 7    2075 sep
## 8    2131 sep
## 9    2009 sep
## 10   1793 sep
## 11   2226 dec
## 12   2154 dec
## 13   2583 dec
## 14   2010 dec
## 15   2190 dec
names(d)
## [1] "values" "ind"
oneway.test(values ~ ind, data=d, var.equal=TRUE)
## 
##  One-way analysis of means
## 
## data:  values and ind
## F = 1.7862, num df = 2, denom df = 12, p-value = 0.2094
# alternative using aov
res <- aov(values ~ ind, data = d)
res
## Call:
##    aov(formula = values ~ ind, data = d)
## 
## Terms:
##                      ind Residuals
## Sum of Squares  174664.1  586719.6
## Deg. of Freedom        2        12
## 
## Residual standard error: 221.1183
## Estimated effects may be unbalanced
summary(res)
##             Df Sum Sq Mean Sq F value Pr(>F)
## ind          2 174664   87332   1.786  0.209
## Residuals   12 586720   48893

16.2 Sources

Example for one-way ANOVA: youtube by Brandon Foltz