A Minimal Book Example
1
Introduction
2
Statistics R functions reference
2.1
Get data
2.2
Data inspection
2.3
Plots
2.4
Analysis of the distribution
2.5
Distributions
2.6
t-Test
2.7
ANOVA
2.8
Machine Learning Functions Reference
2.8.1
Linear Regression
3
Combinatorics
4
Probability
5
Basic Statistics
5.1
Definitions
5.2
Probability
6
Statistical distributions
6.1
Normal Distribution
6.2
Bernoulli Distribution
6.3
Binomial Distribution
6.4
Beta distribution
6.5
Geometric Distribution
6.6
Uniform Distributions
6.7
Poisson Distribution
6.8
Exponential Distribution
6.9
Chi-squared Distribution
7
Primary data analysis
7.1
Analysis of sample distribution
7.1.1
Histogram
7.2
Handling missing data
7.3
Dealing with outliers
8
Data normalization
8.1
Normality test
8.2
Finding Confidence intervals
9
Primary data analysis - Case studies
10
Hypothesis testing
10.1
Hypothesis testing theory
10.2
Hypothesis test (Practice)
11
t-Procedures
11.1
t-test and normal distribution
11.2
One-sample t-test
11.3
Practical example: t-test in R
11.4
Two samples t-test
11.5
Compare Student’s t and normal distributions
11.6
Non-parametric tests
11.7
Mann-Whitney U Rank Sum Test
11.8
Wilcoxon test
12
Tests for categorical variables
12.1
Chi-squared tests
13
Multiple testing
13.1
The Bonferroni correction
14
Sources
14.1
t-test
14.1.1
Two-tailed test
15
Wilcoxon signed-rank test
16
Analysis of Variance (ANOVA)
16.1
One-way ANOVA
16.2
Sources
17
t-test ANOVA difference
18
Chi-squared test
18.1
Multinomial Goodness of Fit
19
Non-parametric Methods
19.1
Sign Test
19.2
Wilcoxon Signed-Rank Test
19.3
Mann-Whitney-Wilcoxon Test
19.4
Kruskal-Wallis Test
20
Correlation
21
Methods and algorithms of machine learning
22
Split data into train and test subsets
23
Estimate model accuracy
23.1
Continuous variables
23.2
Discret variables
24
Model evaluation
25
Cross-validation and Bootstrep
26
Linear Regression
26.1
Linear regression - theory
26.2
Generate random data set for the linear model
26.3
Practical example
26.4
Mean squared error (MSE)
26.5
Linear model in R
26.6
Linear regression model for multiple parameters
26.7
Choosing explanatory variables for the model
26.8
Assessment of model performance for categorical data.
26.9
Confidence intervals for linear model
26.10
Practical examples for linear model regression
27
Linear regression complex cases
27.1
Cars
27.2
Linear regression modeling, compair with kNN
27.3
More complex example
27.4
NEXT part
27.5
NEXT Part
28
Nonlinear regression
29
Multiple linear regression
30
Spline model
30.1
Splines
30.2
Area under the curve using spline method
30.3
Set data using given function and predict curve using spline method
30.4
Generate dataset from a given function
30.5
Split data for train and test
30.6
Diagram of the given function and generated datasets
30.7
Build a model using splines
30.8
Diagram of MSE for train and test data
30.9
Build optimal model and plot for the model
30.10
Bibliograpy
31
Logistic Regression
31.1
Confusion matrix
31.2
Next part
31.3
NExt part
32
Models for binary Data
33
Support Vector Machine
34
Clustering
34.1
Finding distances using factoextra
34.2
Example of choosing clustering model
34.3
K-means clustering
34.4
k-Means
34.5
Hierarchical clustering
34.6
KNN
35
Regularization
36
Factor analysis
37
Principal Component Analysis
38
Principal component analysis
38.1
Basic statistics
38.2
Basic linear algebra (matrices)
38.2.1
t-SNE - Stochastic Neighbor Embedding
39
Learning Vector Quantization
40
Tree-based models
40.1
Classification Tree example
40.2
Regression Tree example
41
Random forest
42
Gradient boosted trees
43
Markov Chain Monte Carlo (MCMC)
44
Simple Markov process
44.0.1
Sources
45
Bayesian inference
45.1
Simple model with one binary parameter
45.2
Grid approximation
45.3
Grid approximation
45.4
Model of birth weights using normal distribution
45.5
A Bayesian model of Zombie IQ
45.6
The BEST models
46
Naive Bayes classifiers
47
Modeling with R caret
48
Modeling with R Tensorflow
49
Perceptron
50
Deeplearning R H2O
Published with bookdown
R statistics
Chapter 42
Gradient boosted trees