Lending Club
Data-driven predictive model that powers an investment strategy
CS 109A | Lending Club | Group # 26 | Fall 2018
CS 109A | Lending Club | Group # 26 | Fall 2018
This website presents a summary of our coursework on predicting fully paid and charged off loans with fairness in selection for the course CS109A:Introduction to Data Science. We used the publicly available lending club data set to work on data driven predictive modeling of loans that powers a strategy designed for investors and aims to be better than the Lending Club strategy.
Lending Club is a US peer-to-peer lending company, headquartered in San Francisco, California. It was the first peer-to-peer lender to register its offerings as securities with the Securities and Exchange Commission, and to offer loan trading on a secondary market. This is a platform to connect investors to borrowers. Lending Club enables borrowers to create unsecured personal loans with a standard loan period of three years. Investors can search and browse the loan listings on Lending Club website and select loans that they want to invest in based on the information supplied about the borrower, amount of loan, loan grade, and loan purpose. Investors make money from interest paid by borrowers for the money being borrowed. Lending Club makes money by charging borrowers an origination fee and investors a service fee.
Lending Club also makes traditional direct to consumer loans, including automobile refinance transactions, through WebBank, an FDIC-insured, state-chartered industrial bank. The loans are not funded by investors but are assigned to other financial institutions.
The peer-to-peer lending industry has grown significantly since its inception in 2007. With billions in annual loans, there are significant opportunities to capitalize on this alternative investment instrument. We have developed a sophisticated investment strategy that utilizes Lending Club historical massive data sets to understand which features best predict whether a loan would be fully paid or would lead to charge off. Using these characteristics, we built and tested many machine learning models, including logistic regression, random forest, quadratic discriminant analysis and other ensemble methods. Our final model is Random forests with accurate results, whose output is loan status with values "Charged Off" or "Fully Paid".
Our website content is spread across the 6 headers. For a deeper understanding at the motivation and problem statement for our project, please click on "Problem Statement" above. For exploring our data sources click on "Data" and for exploratory data analysis, please click on "EDA". To learn more about our baseline model, base learners and ensemble models, please navigate to "Models" above. We have referred enormous amount of information from various websites and explored multiple research papers to understand the work already done for the Lending Club, these sources are found at "Literature Review". Next, for viewing the final results of our analysis, please click on "Results" Last, please navigate to "Conclusion" for view our findings and proposal for future work. To return to this page from any of the above pages, please click "Home" on our header.
Thank you for visiting our project page! For further information about Harvard's Computer Science CS-109a course, please contact Kevin A. Rader, PhD., or Pavlos Protopapas, PhD. Special thanks to our project mentor and TF, Jerry Peng.