Xgb vs Lgb Benchmarks

May 2017 Edition

Machines Used

Server 1

i7-7700K 4c/8t @4.5/4.4GHz
KVM topology: 4S/2C/1T
Server: 64GB RAM 1600MHz
Virtual Machine: 54GB RAM + 128GB swap
RAID 0 2Gbps NVMe drives
Operating System: Windows Server 2012 R2 Datacenter
R flavor: version 3.4 (self-compiled with MinGW 7.1)
Rtools flavor: version 34 (customized for MinGW 7.1)
Visual Studio flavor: 2017
Cinebench R15 score: 193 (single) / 953 (multi)
Geekbench 4.1: 5647 (single) / 18881 (multi) https://browser.geekbench.com/v4/cpu/3150525

Server 2

Dual Xeon Quanta Freedom Ivy Bridge 2x 10c/20t @3.1/2.7GHz
KVM topology: 2S/10C/2T
Server: 96GB RAM 1600MHz
Virtual Machine: 80GB RAM + 128GB swap
RAID 0 2Gbps NVMe drives
Operating System: Windows Server 2012 R2 Datacenter
R flavor: version 3.4 (self-compiled with MinGW 7.1)
Rtools flavor: version 34 (customized for MinGW 7.1)
Visual Studio flavor: 2017
Cinebench R15 score: 110 (single) / 2256 (multi)
Geekbench 4.1: 2776 (single) / 23682 (multi) https://browser.geekbench.com/v4/cpu/3150537

Terminal CPUs below

Datasets Used

Properties:

Purpose
Training observations:
Testing observations:
Features:
Number of train elements:
1. Sparse elements
2. Dense elements (if)
3. Sparsity
4. Class unbalance
5. In-memory size
6. RDS compressed size
7. Checksum (CRC-32)
Number of test elements:
1. Sparse elements
2. Dense elements (if)
3. Sparsity
4. Class unbalance
5. In-memory size
6. RDS compressed size
7. Checksum (CRC-32)

Bosch Dataset

Unbalanced & Noisy
1,000,000 (first)
183,747 (last)
969 (custom)
Train:
1. 184,130,362 elems
2. 969,000,000 elems
3. 81.00% sparsity
4. 171.59x unbalance
5. 2,209,631,688 bytes
6. 686MB (train + test)
7. 954d39db (train + test)
Test:
1. 33,795,315 elems
2. 178,050,843 elems
3. 81.02% sparsity
4. 174.83x unbalance
5. 405,611,128 bytes
6. 686MB (train + test)
7. 954d39db (train + test)

Higgs Dataset

Many obs. & Synthetic
10,000,000 (first)
100,000 (last)
29 (custom)
Train:
1. 267,896,154 elems
2. 290,000,000 elems
3. 7.62% sparsity
4. 1.89x unbalance
5. 3,214,757,056 bytes
6. 1.99GB (train + test)
7. cbf55fd3 (train + test)
Test:
1. 2,679,009 elems
2. 2,900,000 elems
3. 7.62% sparsity
4. 1.89x unbalance
5. 32,151,320 bytes
6. 1.99GB (train + test)
7. cbf55fd3 (train + test)

Reputation Dataset

Noisy & Big (3.3TB dense)
2,250,000 (first)
146130 (last)
23,636 (custom)
Train:
1. 715,463,655 elems
2. 53,181,000,000 elems
3. 98.65% sparsity
4. 3.03x unbalance
5. 8,585,659,832 bytes
6. 5.35GB
7. b7065332
Test:
1. 46,520,952 elems
2. 3,453,928,680 elems
3. 98.65% sparsity
4. 2.93% positive
5. 558,131,392 bytes
6. 354MB
7. 2523e636

Metric Used

The only metric was AUC for its following properties:

Not a calibrable (aka post-optimizable) metric (unlike logarithmic loss (logloss), RMSE, etc.)
Not relying on absolute output values (based on relative ranking only)
Rare classes cannot be ignored by the classifier to score high (cannot optimize for one class)
Not threshold-based (there are as many thresholds as observations, unlike Accuracy, etc.)

Gradient Boosted Algorithms Used

xgboost

Commit used: ccccf8a (Pull Request 2244, May 02 2017)
Flags used: -O2 -mtune=core2
Install with devtools::install_github("Laurae2/ez_xgb/R-package@2017-05-02-v2", force = TRUE)

LightGBM

Commit used: a8673bd (Pull Request 609, June 09 2017)
Flags used: Visual Studio project's default (no CPU tuning)
Install with: devtools::install_github("Microsoft/LightGBM/R-package@a8673bd", force = TRUE)

Hyperparameters

Bosch

Leaves Hyperparameters:

Depth: ∞
Leaves: {7, 15, 31, 63, 127, 255, 511, 1023, 2047, 4095}
Hessian: 1
Column Sampling: 100%
Row Sampling: 100%
Iterations: {2000, 1500, 750, 500, 400, 350, 325, 300, 200, 200}
Learning Rate: 0.02

Depth Hyperparameters:

Depth: {3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
Leaves: {7, 15, 31, 63, 127, 255, 511, 1023, 2047, 4095}
Hessian: 1
Column Sampling: 100%
Row Sampling: 100%
Iterations: {2000, 1500, 1000, 800, 500, 400, 400, 400, 400, 400}
Learning Rate: 0.02

Pruning Hyperparameters:

Depth: 10
Leaves: 1023
Hessian: {1, 5, 25, 125}
Column Sampling: 100%
Row Sampling: 100%
Iterations: 400
Learning Rate: 0.02

Sampling Hyperparameters:

Depth: 6
Leaves: 63
Hessian: 1
Column Sampling: 100%
Row Sampling: {100%, 80%, 60%, 40%}
Iterations: 750
Learning Rate: 0.04

Higgs

Leaves Hyperparameters:

Depth: ∞
Leaves: {7, 15, 31, 63, 127, 255, 511, 1023, 2047, 4095}
Hessian: 1
Column Sampling: 100%
Row Sampling: 100%
Iterations: {2000, 1500, 1000, 500, 400, 350, 300, 250, 200, 150}
Learning Rate: 0.25

Depth Hyperparameters:

Depth: {3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
Leaves: {7, 15, 31, 63, 127, 255, 511, 1023, 2047, 4095}
Hessian: 1
Column Sampling: 100%
Row Sampling: 100%
Iterations: {2000, 1500, 1000, 500, 400, 350, 300, 250, 200, 150}
Learning Rate: 0.25

Pruning Hyperparameters:

Depth: 10
Leaves: 1023
Hessian: {1, 5, 25, 125}
Column Sampling: 100%
Row Sampling: 100%
Iterations: 200
Learning Rate: 0.25

Sampling Hyperparameters:

Depth: 6
Leaves: 63
Hessian: 1
Column Sampling: 100%
Row Sampling: {100%, 80%, 60%, 40%}
Iterations: 400
Learning Rate: 0.25

Reputation

Leaves Hyperparameters:

Depth: ∞
Leaves: {7, 15, 31, 63, 127}
Hessian: 1
Column Sampling: 100%
Row Sampling: 100%
Iterations: {2000, 1250, 1000, 400, 200}
Learning Rate: 0.25

Depth Hyperparameters:

Depth: {3, 4, 5, 6, 7}
Leaves: {7, 15, 31, 63, 127}
Hessian: 1
Column Sampling: 100%
Row Sampling: 100%
Iterations: {2000, 1250, 1100, 1000, 900}
Learning Rate: 0.25

Pruning Hyperparameters:

Depth: 6
Leaves: 63
Hessian: {1, 5, 25, 125}
Column Sampling: 100%
Row Sampling: 100%
Iterations: 1000
Learning Rate: 0.25

Sampling Hyperparameters:

Depth: 6
Leaves: 63
Hessian: 1
Column Sampling: 100%
Row Sampling: {100%, 80%, 60%, 40%}
Iterations: 1000
Learning Rate: 0.25

Some Notes

Source code for reproducibility: https://github.com/Laurae2/gbt_benchmarks
LightGBM bagging is number of threads -dependent. For instance, with 8 threads, results are reproducible only using 8 threads. AUC and final model are affected.
LightGBM has major performance issues when bagging with less than 50% samples. See Microsoft/LightGBM#628 for the known issue.
LightGBM is known faster using Visual Studio than MinGW in Windows, as shown in Microsoft/LightGBM#542. Pull Request Microsoft/LightGBM#584 allows to use a custom-made DLL using Visual Studio, which was the method used for this benchmark for LightGBM.
xgboost fast histogram could not scale properly with more than 10 threads on Reputation dataset. More than 10 threads was not tested for speed purposes.
xgboost fast histogram cannot be used with MinGW 4.9 since Pull Request dmlc/xgboost#2104 of xgboost. See dmlc/xgboost#2165 for the no-solution. The issue is still unsolved other than changing compiler version. This leaves all R Windows users who cannot compile R by themselves in limbo.
We did not use all features on reputation dataset: it would have exploded the construction time of datasets for xgboost and LightGBM. It also may explode RAM and CPU usage altogether. See Microsoft/LightGBM#536 and dmlc/xgboost#2326. The latter issue (xgboost) has no known solution.
There were no negative numbers after preprocessing.
xgboost is known "cheating" when constructing binary datasets in RAM. To avoid this multithreading cheat, we removed the dataset construction time. They are denoted as 0 for every first iterations.
For reproducibility of results, we leave the user compiling R 3.4 with MinGW 7.1 and getting the appropriate license for Windows Server 2012 R2, along setting up a proper virtualization environment from KVM / Linux. NUMA nodes were optimized (pinning, sockets, topology, etc.) for maximum performance on the Dual Xeon server.
Reputation dataset was heavily modified to add synthetic noise where the algorithm must find the proper synthetic signal to generalize correctly. This also means there are two mixed synthetic noises: a random noise which means nothing, and a random noise which means something (but is rarer, up to 100 times more).
To avoid overfitting, the number of iterations was dramatically reduced in certain cases compared to the previous mega benchmark. Therefore, it underestimates the maximum AUC possible.