Models and Results

Approach

We first constructed a set of models that could be used as an investment strategy focused on profit alone. We constructed models using the a variety of methods (linear regression with different forms of regularization, random forest, XGBoost regressor) and determined which model maximized profit the best. Next we compared the profit of this model to the profit produced by random investment. Finally, we analyzed the weights from the best performing linear model to give an interpretation of which variables predicted a profitable investment.

After constructing our models which maximized profit alone, we constructed a set of models that also optimized for fairness. We did this by adding a regularization term that penalized models with low predictive parity. We implemented this novel regularization approach in pytorch, allowing for the construction of linear models and neural net models.

data prep

Most of our data cleaning occurs simultaneously with our EDA. After cleaning, we randomly split our the dataset with an equal number of defaulted and fully paid loans into a training set (n = 37500) and a test set (n = 12500).


X_train, X_test, y_train, y_test = train_test_split(
     df.drop('profit', axis = 1), df['profit'], 
     test_size=0.25, 
     random_state=42
)

We then standardized our features since some of our methods, particularly ridge and lasso regularization, are sensitive to coefficient magnitudes.

def preprocess(X_train, X_test):
    scaler_X = StandardScaler().fit(X_train)
    X_train = scaler_X.transform(X_train)
    X_test = scaler_X.transform(X_test)
    return X_train, X_test, scaler_X
X_train, X_test, scaler = preprocess(X_train, X_test)


profit-only models

Model Selection

For the 5 models we tried, we compared R2 scores using cross validation.

Linear Regression

The linear regression acted as a simple baseline with which we could benchmark other more complex modelling attempts.

model_linear = LinearRegression()
scores_linear = cross_val_score(model_linear, X_train, y_train, cv=5, scoring = 'r2')

Linear Regression with 2nd Degree Polynomial Terms + Lasso Regression

We then made our model more complicated by adding 2nd degree polynomial, transforming our original 56 features into 1596 features . This model severely overfit our training data in cross validation (all R2 scores cross validation scores < 0) . Hence, we decided to include regularization in our model. First, we tried lasso regression. To find the regularization coefficient, we performed 5 fold cross validation. Notice that this means we're actually performing 25 model fits - 5 fits of the LassoCV model, which in turn performs a 5 fold cross validation on the 4 folds of the data used for training.

# Lasso Polynomial Features
model_lasso = LassoCV(cv = 5)
scores_lasso = cross_val_score(model_lasso, X_train_poly, y_train, cv=5)

Linear Regression with 2nd Degree Interaction Terms + Ridge Regression

Next, we performed the same procedure, but using a ridge regularization term instead of lasso.

model_ridge= RidgeCV(cv = 5)
scores_ridge = cross_val_score(model_ridge, X_train_poly, y_train, cv=5)
scores_ridge

Random Forest

After fitting these 3 linear models, we decided to try a more complex nonlinear model: random forest. The additional complexity means that we also have more hyperparameters to tune, which means that brute force cross validation is too computationally expensive. To handle this challenge, we select hyperparameter with two steps. First, we specify a relatively large search space and randomly select 100 hyperparameter sets. We assess these selections using 3 fold cross validation (300 fits in total). Second, we identify the best performing set of hyperparameters from the random selections and perform a hyperparameter grid search around this point using cross validation.

Our random search space is as follows.

{'n_estimators': [50, 100, 150, 200, 250, 300, 350, 400, 450, 500],
 'max_features': [3, 8, 14, 20, 26, 31, 37, 43, 49, 55],
 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None],
 'min_samples_split': [2, 5, 10],
 'min_samples_leaf': [1, 2, 4],
 'bootstrap': [True, False]}

Assessing all possible combination with 3-fold cross validation would require 59400 fits, which would take too long given our computational resources. After selecting 100 random combinations of these hyperparameters, the best performing set was:

{'n_estimators': 350,
 'min_samples_split': 2,
 'min_samples_leaf': 2,
 'max_features': 15,
 'max_depth': 10,
 'bootstrap': False}

We explored the surrounding region with the following refined hyperparameter grid.

{'bootstrap': [True],
 'max_depth': [8, 10, 12],
 'max_features': [13, 15, 17],
 'min_samples_leaf': [2],
 'min_samples_split': [2],
 'n_estimators': [330, 350, 370]}

After validating every combination of this smaller set of hyperparameters, we found the best performing selection to be:

{'bootstrap': True,
 'max_depth': 12,
 'max_features': 17,
 'min_samples_leaf': 2,
 'min_samples_split': 2,
 'n_estimators': 350}

With this set of parameters, we assessed the random forest's performance using 5 fold cv

model_rf = RandomForestRegressor(max_depth = 12, 
                                max_features = 17,
                                min_samples_leaf=2, 
                                min_samples_split=2,
                                n_estimators=350)

scores_rf = cross_val_score(model_rf, X_train, y_train, cv=5)

XGBoost

We used the same methodology to tune the hyperparameters of xgboost as we did with random forest. Our original hyperparameter grid was as follows:

{'n_estimators': [100, 311, 522, 733, 944, 1155, 1366, 1577, 1788, 2000],
 'max_features': [3, 8, 14, 20, 26, 31, 37, 43, 49, 55],
 'max_depth': [3, 4, 5, 6, 7, 8, 10],
 'min_samples_split': [2, 5, 10],
 'min_samples_leaf': [1, 2, 4]}

The best performing selection was:

{'n_estimators': 100,
 'min_samples_split': 5,
 'min_samples_leaf': 2,
 'max_features': 39,
 'max_depth': 4}

Hence, we specified our refined grid to be:

{'max_depth': [3, 4, 5],
 'max_features': [35, 40, 45],
 'min_samples_leaf': [2],
 'min_samples_split': [5],
 'n_estimators': [100, 200, 300]}

After performing a cross validated grid search, we found the best hyperparameter set to be:

{'max_depth': 3,
 'max_features': 35,
 'min_samples_leaf': 2,
 'min_samples_split': 5,
 'n_estimators': 100}

Finally, we evaluated the performance of an xgboost model with these parameters using 5 fold cross validation.

Profit-Only Modelling Results

Below is a plot of the R^2 score of each model in 5 fold cross-validation. We found that the random forest and xgboost models produced the best R^2 scores, meaning they predicted variance in profitability the best. Surprisingly, none of our models perform incredibly well, as most of the variance in profitability is not predicted by our models. It seems that distinguishing among accepted loans to figure out which will be more profitable is a hard problem to solve.

Next, we determined if our best performing model, the xgboost model, could improve an investor's ability to profit. We determined which loans the xgboost model predicted would lead to a profit, then calculated the actual profit from these loan applications. In comparison, we considered the baseline of funding all loans that were accepted by the LendingClub system, which we call the "Existing strategy".


On the left is a plot of the return on investment from our xgboost model compared to the existing LendingClub loan acceptances. The return on investment is a measure of the amount of profit from a loan in relation to the size of the loan. Thus, the xgboost model allowed for a larger gain from investment in relation to the size of loans given out. Moreover, the xgboost model allowed for an average profit of $474.90 on each loan, while the existing LendingClub loan acceptances allowed for an average profit of $277.50. Thus, our investment strategy allows for improved profit over the existing system.

While the lasso model did not perform quite as well as the xgboost model, it has the advantage of being more interpretable. Specifically, we decided to analyze the coefficient values from this model to get a sense for which variables predicted profitability. Below is a plot showing the value of each coefficient in the model. Variables with higher coefficient values played a larger role in predicting profitability.

This model suggests that loan applications are more likely to be profitable if the applicant:

  • If the loan is being given for a term of three years rather than five years
  • Mortgages their home
  • Has a high income
  • Is a fractional rather than whole loan

This model suggests that loan applications are less likely to be profitable if the applicant:

  • Is asking for a large loan
  • Has a high interest rate (which explains why "expected_profit" has a large negative coefficient)
  • Has had a high credit line for a long time


Fairness Analysis

Now we will explore trade-offs between profitability and fairness. We will use a predictive parity approach to fairness. We choose predictive parity over statistical parity because achieving statistical parity requires giving out explicitly unprofitable loans. In other words, some groups will be getting loans that they default on, which can lead to long term financial struggles and increased stress.

Predictive parity ensures that error rates between protected classes and the general population are equivalent. This means that we want to identify safe loans when disadvantaged individuals request them.

Defining a Fair Regularizer

To make sure that our model respects predictive parity, we will add a regularizer to our loss function. Bechavod and Ligett take this approach, but in a binary classification context. Berk et al extend this method to regression, but they only look at group (statistical) fairness and individual fairness. From our literature review, we appear to be the first to define a predictive parity regularizer in a regression context.

Let y_hat_all be our profit predictions for everyone. Let y_hat_protected be our profit predictions for a protected class. Likewise, y_all and y_protected are the respective ground truth profits. Our "fair" loss function is,

where alpha is a tuning parameter for the regularizer. The first term in the loss function corresponds to the usual mean squared error. The second term is the predictive parity regularizer, which penalizes the model when there is a difference between the mean squared error in the protected group and the population overall.

Identifying a Protected Class

We take a one vs. rest approach to defining a protected class and say that anyone who is not a young white male is "protected". There are pros and cons to this approach:

Pros:

  • Allows us to measure overall profit-fairness tradeoff
  • If someone were to operationalize this work, they would only have to adjust a single fairness parameter instead of making tradeoffs between different groups
  • Allows us to account for intersectional class membership

Cons:

  • The error rates might be higher for some groups within the overall protected class
  • Ignores historic factors that might make "fairness" more important for some groups

Implementing the Fair Loss Function in PyTorch

Since we are using a custom loss function, we use pytorch to optimize our models. The code for implementing the loss function is as follows:

class FairMSE(Module):
    def __init__(self, alpha):
        super().__init__()
        self.alpha = alpha
    
    def forward(self, outputs, labels, classes):  
        mse = nn.MSELoss(reduction='elementwise_mean')
        if 1 in classes:         
            protected_outputs = outputs[classes]
            protected_labels = labels[classes]
            total_mse = mse(outputs, labels)
            protected_mse = mse(protected_outputs, protected_labels)
            return total_mse + self.alpha*(protected_mse - total_mse)**2
        else:
            total_mse = mse(outputs, labels)
            return total_mse

where alpha is the regularization parameter, outputs are the profit predictions, labels are the true predictions, and classes is a binary vector indicating membership in the protected class.

The Models

We chose to fit a linear model and a multilayer perceptron, since these models can be optimized using gradient-based methods.

The code for our linear model is:

class LinearRegressionPytorch(nn.Module):
    def __init__(self, input_size):
        super().__init__()
        self.linear = nn.Linear(input_size, 1)
    
    def forward(self, x):
        out = self.linear(x)
        return out

Our neural network has three hidden layers and a linear output layer. Layer 1 has 20 hidden nodes, layer 2 has 10 hidden nodes, layer 3 has 5 hidden nodes, and the linear output layer produces a single profit prediction. We use a relu activation function.

The code for the neural network is:

class MLP(nn.Module):
    def __init__(self, input_dim):
        super(MLP, self).__init__()
        self.hidden = nn.Linear(input_dim, 20)
        self.hidden1 = nn.Linear(20, 10)
        self.hidden2 = nn.Linear(10, 5)
        self.out   = nn.Linear(5, 1)

    def forward(self, x):
        x = F.relu(self.hidden(x))
        x = F.relu(self.hidden1(x))
        x = F.relu(self.hidden2(x))
        x = self.out(x)
        return x

Assessing Fairness-Profitability Trade Off

To assess the profitability-fairness trade off, we vary the alpha parameter from 0 (no fairness regularization) to 10 (high fairness regularization). For each trial, we train our model for 25 epochs using ADAM optimization.

Fairness-profitability trade off results

The results of our experiment can be seen in the plot below. On the left, we plotted the return on investment of our models created with different fairness regularization weights. For both the linear models and the MLP models, return on investment did not seem to be changed when the regularization weight increased. This implies that there is not actually a strong trade-off between profitability and fairness. On the right, we plotted the predictive parity of our models created with different fairness regularization weight. When the models were created with higher regularization weights there was a smaller difference in the MSE for the protected class and the MSE overall. This verified that predictive parity was actually improved by our regularization weight. Thus, we can improve the predictive parity through the use of fairness regularization, and this process does not significantly diminish our return on investment.