Boosting Techniques for Machine Learning — XGBoost Optimization and Hyperparameter Tuning
Optimization
1. Approximate Greedy Algorithm
XGboost will always choose the best gain to determine the split point. So it is a greedy algorithm, which does not guarantee the best results for the long run. When there are a lot of features, it will run forever. So to deal with this, we can quantile the dataset. The more quantiles we set, the more accurate the threshold.
2. Parallel Learning & Weighted Quantile Sketch
The Quantile Sketch Algorithm combines the values from each computer to make an approximate histogram.
In the original quantile, the number of obs in each quantile is the same.
A weighted quantile sketch means the sum of weight in each quantile in the same.
In the regression problem, the weights are just the hessians, which are the same. However, in the classification problem, the weights are the
previous probability * (1- previous probability ). For a binary classification problem, the weights indicate the opposite of confidence classifying the obs into 1 or 0.
The larger the weight it is, the more important the obs is for accuracy improvement. So we get smaller quantiles when we need them.
3. Sparsity-Aware Split Finding
When we have missing values in the features. We will separate the rows which have missing values as a new dataset. After building the tree for non-null rows, we put the missing value rows’ residuals into both left leaf and right leaf and calculate their gains. In the end, we choose the split that gives us the largest Gain value.
4.Cache-Aware Access
The Cache memory in the CPU is faster than the memory of our computers. XGboost put the gradients and hessians into the cache-aware access.
5. Blocks for out-of-core computation
XGBoost split the data into different drives so when it needs them it can parallelly get the data from multiple drives.
Hyperparameter Tuning for XGBoost
A big grid search definitely works. However, it is definitely time-consuming as well.
The way I found efficient is to use RandomizedSearchCV.
There are roughly 5 important parameters we need to tune in XGBoost:
- Learning Rate (learning_rate)
- Max Depth (max_depth)
- Min_Child_Weight (min_child_weight) #If this set this min value high, the algorithm will be more conservative.
- Gamma (gamma). #Minimum loss reduction required to make a further partition
- Colsample by Tree (colsample_bytree) #There are also bylevel and bynodes.
params = { "learning_rate" : [#from 0.01 to 1]
"max_depth" [#default=6] "min_child_weight" [#default=1]
"gamma" [#default=0] "colsample_bytree" [#parameters have a range of (0, 1], the default value of 1] } from sklearn.model_selection import RandomizedSearchCV, GridSearchCV import xgboost classifier = xgboost.XGBClassifer() random_serach = RandomizedSearchCV (classifier, param_distribution=params, n_iter=5, scoring="accuracy") random_search.best_estimator_ random_search.best_params