The Windows Subsystem for Linux (WSL) is a (partially) native Linux running right in Windows:
sudo rstudio-server restart
)You can find the installation steps for a full R and Python setup in the following link: https://github.com/Laurae2/R_Installation
Try to run the following in R to test whether forking is available:
library(parallel)
set.seed(11111)
y <- runif(n = 100000000)
format(object.size(y), units = "Gb")
cl <- makeForkCluster(detectCores())
parSapply(cl, X = seq_len(detectCores()), function(x) {c(exists("X"), exists("y"))})
parSapply(cl, X = seq_len(detectCores()), function(x) {length(y)})
parSapply(cl, X = seq_len(detectCores()), function(x) {head(y)})
stopCluster(cl)
You must get results on the last printed element (a matrix with a number of columns equal to the number of your cores).
8f6aadd
-O3 -mtune=native
3f54429
-O3 -mtune=native
Installing xgboost directly from R:
devtools::install_github("Laurae2/xgbdl")
xgbdl::xgb.dl(compiler = "gcc", commit = "8f6aadd", use_avx = TRUE, use_gpu = FALSE)
Installing LightGBM directly from R:
devtools::install_github("Laurae2/lgbdl")
lgbdl::lgb.dl(commit = "3f54429", compiler = "gcc")
Hyperparameters, average of 5 runs (approximately 48h):
Note: the timing takes into account the binning construction time, which is approximately 50% to 70% of the xgboost timing.
It takes 13 minutes with 1 thread, 2 minutes with 64 threads.
Hyperparameters, average of 5 runs (approximately 14h):
Note: the timing does not take into account the binning construction time.
It takes 16 minutes using 1 thread, 23 seconds using 64 threads.
Use the Performance Analysis if you expect to compare timings data.
Check interactively on Tableau Public:
Provided dynamic and interactive filters: