Intel Compiler Benchmarks

May 2018 Edition

Datasets Used

Higgs

  • 11,000,000 training observations (all)
  • 406 features (interaction (multiplication) of all pairwise combinations of the 28 features)
  • Metric: time taken to finish 10 boosting iterations


  • 4,466,000,000 elements
  • 665,073,059 zero values (14.89% sparsity)
  • 35,728,026,424 bytes (33.3GB)
  • Peak RAM requirement: 82GB

Servers Used (Hardware / Software)

1-64 thread runs

  • CPU: Dual Xeon Gold 6130 (3.7/2.8 GHz, 32c/64t)
  • RAM: 384GB 2666MHz RAM
  • OS: Windows 10 Enterprise + Windows Subsystem for Linux, Ubuntu 16.04
  • Virtualization: None
  • R 3.5, compiled
  • gcc: 5.4 and 8.1
  • icc: 2018.2.199 (from Intel Parallel Studio XE 2018 Update 2)

Gradient Boosted Trees Algorithms Used

xgboost

  • Versions used: commit 8f6aadd
  • Flags used:
    • gcc: -O3 -mtune=native
    • icc: -O3 -ipo -qopenmp -xHost -fPIC

LightGBM

  • Versions used: commit 3f54429
  • Flags used:
    • gcc: -O3 -mtune=native
    • icc: -O3 -ipo -qopenmp -xHost -fPIC

Installation of Gradient Boosted Trees Algorithms

xgboost

Installing xgboost is specific to be done in bash:

git clone --recursive https://github.com/dmlc/xgboost
cd xgboost
git checkout 8f6aadd
cd R-package

In src/Makevars.in, add -DUSE_AVX=ON at line 11.

Then compile xgboost:

R
install.packages('.', repos = NULL, type = "source")

LightGBM

For LightGBM, this also requires a specific installation in bash:

git clone --recursive https://github.com/Microsoft/LightGBM
cd LightGBM
git checkout 3f54429
cd R-package

In src/Makevars.in, replace cmake_cmd content line 50 by: "cmake -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc ":

Then compile LightGBM:

R CMD INSTALL --build . --no-multiarch

Hyperparameters Used (Full list)

xgboost

Hyperparameters, average of 5 runs (approximately 48h):

  • Depth: 8
  • Leaves: 255
  • Hessian: 1
  • Minimum Loss to split: 0
  • Column Sampling: 100%
  • Row Sampling: 100%
  • Iterations: 10
  • Learning Rate: 0.25
  • Boosting method: gbdt, fast histogram
  • Bins: 255
  • Loss function: binary:logistic


Note: the timing takes into account the binning construction time, which is approximately 50% to 70% of the xgboost timing.

It takes 13 minutes with 1 thread, 2 minutes with 64 threads.

LightGBM

Hyperparameters, average of 5 runs (approximately 14h):

  • Depth: 8
  • Leaves: 255
  • Hessian: 1
  • Minimum Loss to split: 0
  • Column Sampling: 100%
  • Row Sampling: 100%
  • Iterations: 10
  • Learning Rate: 0.25
  • Boosting method: gbdt
  • Bins: 255
  • Loss function: binary


Note: the timing does not take into account the binning construction time.

It takes 16 minutes using 1 thread, 23 seconds using 64 threads.

Performance Analysis (gcc 5.4 vs gcc 8.1 vs Icc)

Use the Performance Analysis if you expect to compare timings data.

Check interactively on Tableau Public:

https://public.tableau.com/views/IntelCompilervsgcc5_4and8_1v1/Dashboard?:bootstrapWhenNotified=true&:display_count=y&:display_static_image=y&:embed=y&:showVizHome=no&publish=yes

Provided dynamic and interactive filters:

  • Threads

Performance Analysis (gcc 5.4 vs Icc)

Use the Performance Analysis if you expect to compare timings data.

Check interactively on Tableau Public:

https://public.tableau.com/views/IntelCompilervsgccv1/Dashboard?:bootstrapWhenNotified=true&:display_count=y&:display_static_image=y&:embed=y&:showVizHome=no&publish=yes

Provided dynamic and interactive filters:

  • Threads