Boosting Techniques for Machine Learning — Gradient Boosting for Regression and Classification
In contrast to AdaBoost, the gradient boost start with a single leaf, instead of a tree or stump. This leaf represents the weights for all the samples.
So the first one is the average value of the variable we need to predict. The leaves are usually between 8 and 32. Gradient Boost scales all trees by the same amount.
1. Initialize the weights.
2. Build a tree based on error (Residuals).
We replace the residuals with the average. Usually, the tree will overfit (low bias with high variance). Gradient Boost uses a learning rate to scale the contribution from the new tree.
Step 1: Define the loss function
Gamma is the predicted value. Argmin of gamma means we need to find the gamma which minimizes the sum of loss function values.
For this loss function, it returns the average.
Step 2: Update the former weights
M is the number of trees, which is normally 100 or more. This is just creating residuals. The stuff inside the square brackets is called a gradient.
At last, we can make a new prediction for every example. The “v” here is the learning rate. Then we can calculate the new residuals and repeat the steps above.