Chapter 24 Model evaluation
bias and variance are two sources of error in Machine Learning.
Bias: error from incorrect model assumptions.
Err(Training)
High bias means underfitting
Variance: error from sensitivity to small fluctuations in the training set
Err(Testing) - Err(Training)
High variance means overfitting
Bias-variance tradeoff - Finding an adequate balance between model learning and model generalization.
To reduce model bias:
1. Increase the model size.
2. Modify input features using error analysis.
3. Reduce or eliminate regularization.
4. Modify model architecture.
To reduce model variance:
1. Add more training data.
2. Add regularization (this reduce variance but increase bias).
3. Peform feature selection.
4. Decrease model size.
Strategies to build ensemble model:
- Bagging (Bootstrap AGGregatING) (Random Forests)
- Boosting (AdaBoost, Gradient Boosted Trees)
- Stacking (Linear regression, elastic net regression)
library(caret)
# Calculates performance metrics across all resamples
resamples()
# Correlation among the base learners' predictions
modelCor()