Windows vs WSL Benchmarks

May 2018 Edition

The Windows Subsystem for Linux (WSL) is a (partially) native Linux running right in Windows:

Sort of virtualization, even if it is not truly virtualization.
Forking is available!!!!! (when creating parallel R processes, you share the main memory at the time of forking, and the new objects inside the parallel processes are not shared and must be gathered manually)
It incurs a small performance cost as the kernel catches the Linux calls to translate them for Windows - in addition, not every Linux feature is available (no Linux kernel for instance)
You can install R and Python inside without any issues
You can even install RStudio Server, but it must be run every time after you reboot (this is very simple: open a Bash shell, then write sudo rstudio-server restart)
You can also run Jupyter Notebook / Jupyter Lab no question asked
WSL obeys Windows Firewall: you can use the Advanced settings with a graphical user interface to setup the protection of your Linux processses
WSL cannot yet passthrough GPU: no GPU access in WSL!

You can find the installation steps for a full R and Python setup in the following link: https://github.com/Laurae2/R_Installation

Try to run the following in R to test whether forking is available:

library(parallel)

set.seed(11111)

y <- runif(n = 100000000)

format(object.size(y), units = "Gb")

cl <- makeForkCluster(detectCores())

parSapply(cl, X = seq_len(detectCores()), function(x) {c(exists("X"), exists("y"))})

parSapply(cl, X = seq_len(detectCores()), function(x) {length(y)})

parSapply(cl, X = seq_len(detectCores()), function(x) {head(y)})

stopCluster(cl)

You must get results on the last printed element (a matrix with a number of columns equal to the number of your cores).

11,000,000 training observations (all)
406 features (interaction (multiplication) of all pairwise combinations of the 28 features)
Metric: time taken to finish 10 boosting iterations

Installing xgboost directly from R:

devtools::install_github("Laurae2/xgbdl")

xgbdl::xgb.dl(compiler = "gcc", commit = "8f6aadd", use_avx = TRUE, use_gpu = FALSE)

Installing LightGBM directly from R:

devtools::install_github("Laurae2/lgbdl")

lgbdl::lgb.dl(commit = "3f54429", compiler = "gcc")

Hyperparameters, average of 5 runs (approximately 48h):

Note: the timing takes into account the binning construction time, which is approximately 50% to 70% of the xgboost timing.

It takes 13 minutes with 1 thread, 2 minutes with 64 threads.

Hyperparameters, average of 5 runs (approximately 14h):

Note: the timing does not take into account the binning construction time.

It takes 16 minutes using 1 thread, 23 seconds using 64 threads.

Use the Performance Analysis if you expect to compare timings data.

Check interactively on Tableau Public:

Provided dynamic and interactive filters:

Google Sites

Report abuse