Boosting Techniques for Machine Learning — XGBoost Optimization and Hyperparameter Tuning

Optimization

1. Approximate Greedy Algorithm

XGboost will always choose the best gain to determine the split point. So it is a greedy algorithm, which does not guarantee the best results for the long run. When there are a lot of features, it will run forever. So to deal with this, we can quantile the dataset. The more quantiles we set, the more accurate the threshold.

2. Parallel Learning & Weighted Quantile Sketch

The Quantile Sketch Algorithm combines the values from each computer to make an approximate histogram.

In the original quantile, the number of obs in each quantile is the same.

A weighted quantile sketch means the sum of weight in each quantile in the same.

In the regression problem, the weights are just the hessians, which are the same. However, in the classification problem, the weights are the

previous probability * (1- previous probability ). For a binary classification problem, the weights indicate the opposite of confidence classifying the obs into 1 or 0.

The larger the weight it is, the more important the obs is for accuracy improvement. So we get smaller quantiles when we need them.

3. Sparsity-Aware Split Finding

When we have missing values in the features. We will separate the rows which have missing values as a new dataset. After building the tree for non-null rows, we put the missing value rows’ residuals into both left leaf and right leaf and calculate their gains. In the end, we choose the split that gives us the largest Gain value.

4.Cache-Aware Access

The Cache memory in the CPU is faster than the memory of our computers. XGboost put the gradients and hessians into the cache-aware access.

5. Blocks for out-of-core computation

XGBoost split the data into different drives so when it needs them it can parallelly get the data from multiple drives.

Hyperparameter Tuning for XGBoost

A big grid search definitely works. However, it is definitely time-consuming as well.

The way I found efficient is to use RandomizedSearchCV.

There are roughly 5 important parameters we need to tune in XGBoost:

  1. Learning Rate (learning_rate)
  2. Max Depth (max_depth)
  3. Min_Child_Weight (min_child_weight) #If this set this min value high, the algorithm will be more conservative.
  4. Gamma (gamma). #Minimum loss reduction required to make a further partition
  5. Colsample by Tree (colsample_bytree) #There are also bylevel and bynodes.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sean Zhang

Sean Zhang

Data Science | Machine Learning| Data Engineer