Articles about:
... can be found below in an ordered list by date.
They are organized as follows:
Date: Jun 10, 2017
Are you thinking about using LightGBM on Windows?
If yes, should you choose Visual Studio or MinGW as the compiler? We are checking here the impact on the compiler on the performance of LightGBM!
In addition, some juicy xgboost comparison: they bridged the gap they had versus LightGBM!
Date: May 25, 2017
Thinking about Intel vs AMD Ryzen?
What about picking both together, and putting them on a ring of xgboost benchmarks? This is what we are doing here!
We are also looking indirectly at Linux vs Windows, and baremetal vs virtualized servers. Turns out virtualized servers and Windows servers are actually doing very well against Linux and baremetal servers.
Date: May 14, 2017
Using xgboost fast histogram?
Ever heard about old and new fast histogram?
This is the comparison between old and new xgboost fast histogram! Get ready to see... juicy 75% improvements!
Date: Apr 30, 2017
Remember the comparison between exact and fast histogram xgboost? Here they are both together!
Date: Apr 29, 2017
Using fast histogram xgboost? You are going to get served with benchmarks using:
Best practice to remember: fast histogram xgboost scales very well with frequency (GHz). Using too many cores will destroy heavily the speed of your training.
Date: Apr 27, 2017
Using exact xgboost? You are going to get served with benchmarks using:
Best practice to remember: exact xgboost scales very well with number of cores. Frequency is secondary.
Date: Apr 23, 2017
Using decision trees and using categorical features?
Should you use...:
We will show one-hot encoding is the worst you can use, while categorical features are the best ever you can use, if and only if the supervised machine learning program can handle them.
Date: Apr 16, 2017
When you have a CPU with hyperthreading, make sure you are using all its available performance.
Do not believe the myth "number of threads = number of physical cores" anymore.
We are not in 2000-era where multithreading was horribly done.
Date: Jan 10, 2017
Programming practices: is there a sensible difference between floats and doubles when it comes to speed?
Date: Jan 09, 2017
We are comparing here xgboost (exact) and LightGBM.
The computation speed is 10x faster using LightGBM.
Date: Jan 07, 2017
xgboost has a new method for boosting, providing excellent performance: fast histogram.
Date: Dec 07, 2016
Think you don't understand xgboost's gblinear? Think again. That's just a generalized linear model.
Date: Nov 25, 2016
Using virtualization? The CPU topology you are passing to your virtual machine matters. But by how much? (xgboost exact)
This time, we are looking for increasing number of sockets.
Date: Nov 14, 2016
Using virtualization? The CPU topology you are passing to your virtual machine matters. But by how much? (xgboost exact)
We will look at the number of cores passed to the virtual machine.
Date: Nov 08, 2016
Statistical tests are not statistical tests anymore when using large amount of data.
They were just not made for that.
Date: Nov 06, 2016
Have a metric which is quadratic?
Then improving it becomes quadratic, this is easy as piece to understand. Explaining the phenomena is something different.
Date: Oct 15, 2016
Do you have many features?
Are you lost in all these features?
Think you can go through all of them one by one?
A tableplot solves your problem.
Date: Sep 03, 2016
When you have row ID leakage and you have a not too large sample size (less than 100,000 rows), what does machine learning say?
Date: Sep 03, 2016
When you have row ID leakage and you have a not too large sample size (less than 100,000 rows), what does statistics say?
Date: Sep 01, 2016
What do you have to say about hierarchical supervised machine learning?
The answer: it depends.
Date: Aug 26, 2016
Explains why a generalized linear model is a boosted model, and why having many features do not matter for training speed when they are sparse.
Date: Aug 03, 2016
We all know xgboost is a nightmare to compile if you are a total beginner. Here is an example of compiling xgboost for both CLI (command line interface) and R!
Date: Jun 06, 2016
Do you know how to use PCA? Yes...
But do you know when to use it?
Date: May 02, 2016
Why you should not post-process rankings when you are dealing with noisy data?
We are taking the example of using Santander Customer Satisfaction to show irrationality of hard rules.
Date: Apr 18, 2016
Still not understanding the basics of gradient descent shrinkage?
The learning rate in gradient boosted trees is explained here using an analogy with a pedestrian.
Date: Apr 05, 2016
Did you ever know you could use t-SNE on features instead of using them on observations?
Did you ever wanted to map visually the information relationship between features?
Here you have it: t-SNE on features.
Date: Mar 12, 2016
How can you prove whether a machine learning problem requires a linear solution or a non-linear solution?
We will be using BNP Paribas Cardif Claims Management as an example.
Date: Mar 06, 2016
Thinking about NAs? Why NAs matters in tree-based models?
We are using xgboost as an example.
Date: May 04, 2016
Need to understand why Gamma will help you squeezing even better performance from xgboost?
Here you are served.