The gradient is used to minimize the loss function (error - difference between the actual values and predicted values). It is basically the partial derivative of the loss function, so it describes the steepness of our error function.
In each round of training, the weak learner is built and its predicted values are compared to the actual values. The distance or difference between the prediction and reality represents the error rate of our model.
Take the derivative (gradient) of the Loss Function (error) of each parameter. Calculate the Step Size and Learning Rate and calculate new parameters based on that. In this way, you will create a new Weak Learner. Keep repeating the steps (descending the gradient) and keep generating the new learners until Step Size is very small or maximum number of steps are completed.
By using gradient descent and updating our predictions based on a learning rate (the “step size” with which we descend the gradient), we can find the values where loss function is minimum. So, we are basically updating the predictions such that the sum of our residuals is close to 0 (or minimum) and predicted values are sufficiently close to actual values.
XGBoost stands for Extreme Gradient Boosting. XGBoost is a specific implementation of the Gradient Boosting method which delivers more accurate approximations by using the strengths of second order derivative of the loss function, L1 and L2 regularization and parallel computing.
XGBoost is more regularized form of Gradient Boosting. XGBoost uses advanced regularization (L1 & L2), which improves model generalization capabilities.
XGBoost delivers high performance as compared to Gradient Boosting. Its training is very fast and can be parallelized / distributed across clusters.
XGBoost computes second-order gradients, i.e. second partial derivatives of the loss function, which provides more information about the direction of gradients and how to get to the minimum of our loss function.
XGBoost also handles missing values in the dataset. So, in data wrangling, you may or may not do a separate treatment for the missing values, because XGBoost is capable of handling missing values internally.
Source Code: GRE Data
R Code: