Literature Review
CS 109A | Lending Club | Group # 26 | Fall 2018
CS 109A | Lending Club | Group # 26 | Fall 2018
[1] P2P loan selection. Andy Feis, Viraj Mehta, Scott Morris, John Solitario, Cameron Van De Graaf (2016). Stanford University.
This paper talks about how to create a continuous model for assessing loan risk, to identify the safest loans in each bucket. Thus, we could get the same interest rate while decreasing our default risk, increasing returns over the basket approach provided by Lending Club. It helped us to design the investment strategy.
[2] For People of Color Banks are shutting the Door to Home ownership. Aaron Glantz and Emmanuel Martinez (2018).
This article talks about discrimination that exists behind banks lending loans to individuals. Few examples are cited showing pattern of troubling denials for people of color across the country. It was useful to constrain our model regarding fairness and interpretability.
[3] Financial Data Analysis. Sabber Ahamed (2018).
This paper talks about how to approach the problem of loan selection from a data science perspective. We used this paper as our guide for data clean up and exploratory data analysis.
[4] Introduction to Feature Selection. Jason Brownlee (2014).
This article outlines the different methods involved in feature selection and when to use it. Also the difference between feature selection and engineering. It helped us to select our feature selection algorithms that reduced our predictors considerably before building a model.
[5] How to Predict If a Borrower Will Pay You Back. Seth Stephens-Davidowit (2017).
Some studies suggest that words used on loan applications can predict the likelihood of charge-off. In this section we use natural language processing algorithms to extract features from the loan title and description filled in by the borrower when requesting the loan. We then use a Naive Bayes classifier and a random forest for this task.
[1] Lending Club (Company Website with lending club data used for analysis)
[2] FDIC Equal Housing Lender. (Law, Regulations and Related Acts)
[3] Predicting Loan Repayment. Imad Dubbura (2018).
[4] An introduction to feature extraction. I. Guyon et al. (p.10, figure 2 (e))
[5] Drop Highly Correlated Features. Chris Albon (2018).
[6] Resampling strategies for imbalanced datasets. Rafael Alencar (2017).
[7] Imbalanced learn package documentation Article (2018)
[8] Choosing the Right Metric for Evaluating Machine Learning Models. Alvira Swalin (2018).
[9] Fine tuning a classifier in scikit-learn. Kevin Arvai (2018).
[10] Credit Risk Modelling. Rafael Pierre(2018).
[11] Ultimate guide to deal with Text Data (using Python). Shubham Jain (2018).
[12] Pubs-Stats-Loan Data Analysis. Ryan Speed.
[13] An investigation of the false discovery rate and the misinterpretation of p-values. R Soc Open Sci. (2014 Nov); 1(3): 140216.
[14] What does it mean for an algorithm to be fair? Jeremy Kun (2015)
[15] Learning Fair Representations Zemel, Richard S. et. al. (2013)
[16] Statistical Parity. Jeremy Kun (2015)
[17] Disparate impact. Wikipedia (2018)