Basakstat - Gradient Boosting Mechanism

Gradient Boosting

and

Extreme Gradient Boosting

GBM (Gradient Boosting Machine)

The gradient is used to minimize the loss function (error - difference between the actual values and predicted values). It is basically the partial derivative of the loss function, so it describes the steepness of our error function.

In each round of training, the weak learner is built and its predicted values are compared to the actual values. The distance or difference between the prediction and reality represents the error rate of our model.

Take the derivative (gradient) of the Loss Function (error) of each parameter. Calculate the Step Size and Learning Rate and calculate new parameters based on that. In this way, you will create a new Weak Learner. Keep repeating the steps (descending the gradient) and keep generating the new learners until Step Size is very small or maximum number of steps are completed.

By using gradient descent and updating our predictions based on a learning rate (the “step size” with which we descend the gradient), we can find the values where loss function is minimum. So, we are basically updating the predictions such that the sum of our residuals is close to 0 (or minimum) and predicted values are sufficiently close to actual values.

XGBoost (Extreme Gradient Boosting)

XGBoost stands for Extreme Gradient Boosting. XGBoost is a specific implementation of the Gradient Boosting method which delivers more accurate approximations by using the strengths of second order derivative of the loss function, L1 and L2 regularization and parallel computing.
XGBoost is more regularized form of Gradient Boosting. XGBoost uses advanced regularization (L1 & L2), which improves model generalization capabilities.
XGBoost delivers high performance as compared to Gradient Boosting. Its training is very fast and can be parallelized / distributed across clusters.
XGBoost computes second-order gradients, i.e. second partial derivatives of the loss function, which provides more information about the direction of gradients and how to get to the minimum of our loss function.
XGBoost also handles missing values in the dataset. So, in data wrangling, you may or may not do a separate treatment for the missing values, because XGBoost is capable of handling missing values internally.

Example with R Code (XGBoost)

Source Code: GRE Data

R Code:

# Packageslibrary(xgboost)library(magrittr)library(dplyr)library(Matrix)
# Datadata <- read.csv(file.choose(), header = T)str(data)data$rank <- as.factor(data$rank)
# Partition dataset.seed(1234)ind <- sample(2, nrow(data), replace = T, prob = c(0.8, 0.2))train <- data[ind==1,]test <- data[ind==2,]
# Create matrix - One-Hot Encoding for Factor variablestrainm <- sparse.model.matrix(admit ~ .-1, data = train)head(trainm)train_label <- train[,"admit"]train_matrix <- xgb.DMatrix(data = as.matrix(trainm), label = train_label)
testm <- sparse.model.matrix(admit~.-1, data = test)test_label <- test[,"admit"]test_matrix <- xgb.DMatrix(data = as.matrix(testm), label = test_label)
# Parametersnc <- length(unique(train_label))xgb_params <- list("objective" = "multi:softprob", "eval_metric" = "mlogloss", "num_class" = nc)watchlist <- list(train = train_matrix, test = test_matrix)
# eXtreme Gradient Boosting Modelbst_model <- xgb.train(params = xgb_params, data = train_matrix, nrounds = 1000, watchlist = watchlist, eta = 0.001, max.depth = 3, gamma = 0, subsample = 1, colsample_bytree = 1, missing = NA, seed = 333)
# Training & test error plote <- data.frame(bst_model$evaluation_log)plot(e$iter, e$train_mlogloss, col = 'blue')lines(e$iter, e$test_mlogloss, col = 'red')
min(e$test_mlogloss)e[e$test_mlogloss == 0.625217,]
# Feature importanceimp <- xgb.importance(colnames(train_matrix), model = bst_model)print(imp)xgb.plot.importance(imp)
# Prediction & confusion matrix - test datap <- predict(bst_model, newdata = test_matrix)pred <- matrix(p, nrow = nc, ncol = length(p)/nc) %>% t() %>% data.frame() %>% mutate(label = test_label, max_prob = max.col(., "last")-1)table(Prediction = pred$max_prob, Actual = pred$label)

Important links

Page updated

Google Sites

Report abuse