Conclusions and Future Work

Conclusions

We set out to create an investment strategy that both allowed for profit and ensured the fairness of the loan decisions being made. We created a model that would allow an investor to profit from their lending decisions on LendingClub. Our model would improve the return on investment compared to all the loans accepted on LendingClub. We implemented a new regularizer, which penalizes models based on the degree to which they violate predictive parity. Thus, when building investment strategies, it seems feasible to not only optimize for profit, but to simultaneously optimize for fairness as well.

While we began our analysis with the assumption that some degree of fairness would need to be sacrificed in order to increase profitability, our results lead us to believe that there is not necessarily a tension between these two goals. When we increasingly optimized for fairness, the profitability of our model was not significantly diminished. This project has left us optimistic that lending institutions can improve both fairness and profitability, and that these goals are not necessarily at odds.

Future WORK

With respect to the profit-only model, our investment strategy is to invest in all loans that our model predicts will yield a positive return. Given that lenders do not have an unlimited budget, future work should focus on finding optimal investment strategies with a constrained budget.

Another avenue to explore is creating a meaningful representation of the loan description column. The loan description column included unstructured text from applicants describing their reason for applying for the loan. We removed this variable during data cleaning because we could not find an easy way to categorize this information given the unstructured and variable nature of the responses. However, it seems likely that the reason someone is applying for a loan would help predict if they will repay the loan. Thus, an improved model would find a way to categorize this unstructured text and include it as another variable.

In terms of fairness, we have found that there is a tension between interpretability and fairness with respect to their effect on profit. Future research should investigate technical strategies to create interpretable, fair, and profitable models and also whether a fair model needs to be interpretable.

Finally, In order to evaluate the fairness of our lending strategies, we inferred each loan applicant's demographic identity from their 3-Digit Zip Code. While this method gave us some leverage on assessing fairness, averaging over large geographical regions introduces a large amount of noise in these demographic inferences. Ideally, we would have the demographic data of each loan application, so we could more accurately assess the fairness of our lending strategy. Currently, lending institutions seem to hope that by willfully remaining ignorant of demographic information, they can make decisions without regard to the loan applicants' identities, leading to fair outcomes. As we reviewed in the Motivation Section, this state of affairs is not actually leading to fair outcomes. We recommend that more demographic data be collected on applicants to lending institutions, so that the fairness of these institutions' policies can be truly assessed.