KAU Data Scienсe Center

3.1.8. XGBoost

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable [Chen 2016] [DMLC] [Mitchell 2017] [XGBoost]. It is an open-source software library that provides the gradient boosting framework for C++, Java, Python,R, and Julia and works on Linux, Windows, and MAC OS.

It also supports the distributed processing frameworks Apache Hadoop/Spark/Flink and DataFlow and has GPU support. The XGBoost library implements the gradient boosting decision tree algorithm. It has gained much popularity and attention recently as it was the algorithm of choice for many winning teams of a number of ML competitions. XGBoost implements ML algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT or GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples. The term “gradient boosting” comes from the idea of boosting or improving a single weak model by combining it with a number of other weak models in order to generate a collectively strong model. XGBoost boosts the weak learning models to strong by iteratively learning.

Strong points

High execution speed and model performance.
Parallelisation of tree construction using all of CPU cores during trainings.
Distributed computing for training very large models using a cluster of machines.
Out-of-core computing for very large datasets that do not fit into memory.
Cache optimization of data structures and algorithms to make best use of hardware.

Weak points

It is only a boosting library that works for tabular data. Therefore it will not work for image recognition, NLP or computer vision.

Return to Contemt

Google Sites

Report abuse