Literature Review

CS 109A | Lending Club | Group # 26 | Fall 2018

Papers/Articles:

[1] P2P loan selection. Andy Feis, Viraj Mehta, Scott Morris, John Solitario, Cameron Van De Graaf (2016). Stanford University.

This paper talks about how to create a continuous model for assessing loan risk, to identify the safest loans in each bucket. Thus, we could get the same interest rate while decreasing our default risk, increasing returns over the basket approach provided by Lending Club. It helped us to design the investment strategy.

[2] For People of Color Banks are shutting the Door to Home ownership. Aaron Glantz and Emmanuel Martinez (2018).

This article talks about discrimination that exists behind banks lending loans to individuals. Few examples are cited showing pattern of troubling denials for people of color across the country. It was useful to constrain our model regarding fairness and interpretability.

[3] Financial Data Analysis. Sabber Ahamed (2018).

This paper talks about how to approach the problem of loan selection from a data science perspective. We used this paper as our guide for data clean up and exploratory data analysis.

[4] Introduction to Feature Selection. Jason Brownlee (2014).

This article outlines the different methods involved in feature selection and when to use it. Also the difference between feature selection and engineering. It helped us to select our feature selection algorithms that reduced our predictors considerably before building a model.

[5] How to Predict If a Borrower Will Pay You Back. Seth Stephens-Davidowit (2017).

Some studies suggest that words used on loan applications can predict the likelihood of charge-off. In this section we use natural language processing algorithms to extract features from the loan title and description filled in by the borrower when requesting the loan. We then use a Naive Bayes classifier and a random forest for this task.