Motivation AND Literature Review

Motivation

As data science tools increasingly become adopted by lending agencies, notions of fair and equal lending practices, as defined by US legal historical precedent, must be revisited, as statistical modeling and algorithms add another layer of complexity to anti-discrimination and diversity measures. In embarking on this project, we sought to create a model that both adheres normative-historical and nascent machine learning concepts of fairness, as well as maximizes profits for lending agencies.

Due to both the legacy of redlining and scholarship on the relationship between access to capital and economic development, discussions revolving around fair lending practices cannot ignore equity considerations. Although it is now illegal to base lending decisions on the basis of protected classes, or on the basis of gender, race, age, disability, specified by Civil Rights law, decision practices based on other factors have still shown to deny more loans to minorities than non-protected groups. In fact, several recent studies show that predictive models being adopted by not only lending agencies, but other institutions such as the US Criminal Justice Department, may be worsening inequality in their decision practices, while being able to taken on the guise of neutrality. [3]

We turned to LendingClub’s publicly available data set to explore the possibility of basing loan decisions off of nuanced understandings of fairness. LendingClub data is particularly of use to us in addressing these questions since it wide range, providing $1000 to $40000 loans to individuals throughout the US, and that it is considered an “Equal Housing Lender,” meaning that loan decisions are made regardless to "race, color, religion, national origin, sex, handicap, and familial status." Such data has potential for us to observe biases and explore alternative loan approval models.

Literature Review

Legal Definitions of Fairness

Disparate Treatment. The Fair Housing Act is relevant to the case of fairness in loan practices, not only due to the fact there is a clear relationship between loans and place-based forms of access such as housing and economic development, but, also due to the fact that redlining, which it sought to end, is a clear case of “redundant encoding” or in other words, utilizing an unprotected class (zip codes) to bar the protected class (black people) from receiving loans. As zip codes were a proxy for race, loaners were able to indirectly discriminate again black loan applicants. In order to end and prevent further cases of discrimination the Fair Housing Act designated race, color, religion, national origin, sex, handicap, or familial status as protected classes, that may not be considered in determining whether a loan applicant receives services or not. The Fair Housing Act is one example of policy that enforces that “disparate treatment” making decisions on the of groups protected by Title VII of the Civil Rights Act of 1968. [2] [5]

Disparate Impact. Another legal framework by which Lending Club must adhere to and is frequently used to measure fairness is the legal term “disparate impact,” or “settings where a penalty policy has unintended disproportionate adverse impact on members of a protected group.” Such measure was born from 1971 Supreme Court case Grigg v. Duke Power Company, which ruled the utilization of intelligence test and high school diplomas as determinants of whether job applicants receive employment, as forms of discrimination. Although such determinants avoided the protected classes they disproportionately favored white applicants. In Texas Department of Housing and Community Affairs v. Inclusive Communities Project, disparate impact again was enforced, as the inclusive communities project had found that the statistical measures by which the Texas Department of Housing was determining who received tax credit programing led to racial segregation. Ruling, “practices, procedures, or test neutral on their face, and even neutral in terms of intent, cannot be maintained if they operate to ‘freeze’ the status quo of prior discriminatory employment practices.”

Algorithmic Definitions of Fairness

In addition to understanding the legal framework that shapes Lending Clubs existing practices we considered three leading definitions of fairness that pertain to algorithmic decision-making practices, such as determining whether a person is approved for a loan or not. We considered the following three definitions of fairness: classification, statistical parity, and predictive parity. [11]

Statistical Parity or Group Fairness is similar to the legal definition of “disparate impact”. It is a measure of the difference in probability that a member of the minority group vs a member of the majority group get assigned a certain outcome. A classifier satisfies this definition if subjects in both protected and unprotected groups have equal probability of being assigned to the positive predicted class. Such approach to fairness considers group fairness to be of priority over individual fairness as it works to equalize decision outcomes evenly for both protected and non-protected groups. Studies on the use of statistical parity have been used with regard to university admission and recidivism. Statistical parity may lead to negative outcomes such as self-fulfilling prophecy, choosing random members of group S in order in order to achieve statistical parity rather than the most “deserving individuals” and reverse tokenism, randomly rejecting highly qualified applicants from non-protected groups to be able to deny discrimination charges. [1]

Anti-classification or classification is essentially, an extension of disparate treatment, which is to avoid making decisions based on the classifiers of protected groups. Thus, in embarking on our project, we sought to incorporate algorithmic understandings of fairness that extend beyond “anti-classification, meaning that protected attributes – like race, gender, and their proxies – are not explicitly used to make decisions,” as reports such as Propublica’s Machine Bias that predictive modeling, even when protected classes are left out, still can lead to racial discrimination. [4] [6]

Predictive Parity or Calibration is an approach to fairness that identifies sets of instances in a given binary decision, should a loan application be denied(negative) or approved (positive), and give each instance a probability x for all positive instances. Then it identifies the approximate fraction x of the positive set of instances that is observable across both sub-populations and the overall population. Predictive parity is criticized because of how it works for individuals versus groups, since it is impossible to achieve equal false positive rates and equal false negative rates across groups that are affected disproportionately for a given phenomenon.

Transparency

is also a dimension of fairness we considered in developing our models. We considered the following three transparency characteristic goals for machine learning models outlined by LIME: 1) interpretable, easily understood reasoning by humans, 2) faithful, an easy explanation of how models behave, and model agnostic, 3) agnostic, models that could be use easily across various scenarios. [10] [9]

Datasets

In order to build our models we utilized the LendingClub dataset which includes information on accepted and rejected loans from 2011 to 2017. We also utilized the 2016 US Census Data set in order to gain an understanding of how LendingClubs decisions to approve a loan application affected protected groups. [7]

Works Cited

[1] Dwork, Cynthia, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Rich Zemel. “Fairness Through Awareness.” ArXiv:1104.3913 [Cs], April 19, 2011. http://arxiv.org/abs/1104.3913.

[2] “FDIC Law, Regulations, Related Acts - Rules and Regulations.” Accessed December 12, 2018. https://www.fdic.gov/regulations/laws/rules/2000-6000.html.

[3] “For People of Color, Banks Are Shutting the Door to Homeownership.” Reveal (blog), February 15, 2018. https://www.revealnews.org/article/for-people-of-color-banks-are-shutting-the-door-to-homeownership/.

[4] Friedler, Sorelle A., Carlos Scheidegger, and Suresh Venkatasubramanian. “On the (Im)Possibility of Fairness.” ArXiv:1609.07236 [Cs, Stat], September 23, 2016. http://arxiv.org/abs/1609.07236.

[5] “HUD.Gov / U.S. Department of Housing and Urban Development (HUD).” Accessed December 12, 2018. https://www.hud.gov/program_offices/fair_housing_equal_opp/fair_housing_act_overview.

[6] Julia Angwin, Jeff Larson. “Machine Bias.” Text/html. ProPublica, May 23, 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.

[7] Office, American Community Survey. “2016 Data Profiles.” Accessed December 12, 2018. https://www.census.gov/acs/www/data/data-tables-and-tools/data-profiles/2016/.

[8] “Racial Discrimination: Banks Are Shutting Door to HomeownershipReveal.” Accessed December 12, 2018. https://www.revealnews.org/article/for-people-of-color-banks-are-shutting-the-door-to-homeownership/.

[9] Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 1135–44. San Francisco, California, USA: ACM Press, 2016. https://doi.org/10.1145/2939672.2939778.

[10] Stoyanovich, Julia, Bill Howe, Hv Jagadish, and Gerome Miklau. “Panel: A Debate on Data and Algorithmic Ethics.” Proceedings of the VLDB Endowment 11, no. 12 (August 1, 2018): 2165–67. https://doi.org/10.14778/3229863.3240494.

[11] Verma, Sahil, and Julia Rubin. “Fairness Definitions Explained.” In Proceedings of the International Workshop on Software Fairness - FairWare ’18, 1–7. Gothenburg, Sweden: ACM Press, 2018. https://doi.org/10.1145/3194770.3194776.