Boston HMDA: A Tidyverse tale
A modern parable of race, banking and a digital perusal of discrimination. Put the numbers to the test and uncover how home ownership can be influenced by institutional lending practices
A modern parable of race, banking and a digital perusal of discrimination. Put the numbers to the test and uncover how home ownership can be influenced by institutional lending practices
Varian (2014) revisits the classic mortgage lending discrimination study introduced by Munnell, Tootell, Browne, and McEneaney (1996) of the Boston Federal Reserve. Varian (chief economist at Google) applied machine learning algorithms to model the lending behaviour using the same dataset. Munnell et al (1996) were motivated by availability of new data on mortgage applications, showing that African American and Hispanic, "black" applicants were two to three times as likely to be turned down for mortgages as all other applicants. Munnell et al (1996) gathered all the variables known to be missing from the usual HMDA analysis, such as the applicants' debt burdens and credit histories, to determine if economic factors explained this disparity in mortgage denial rates for 'black' relative to all others. This involved incurring some considerable cost to collate the additional information and this additional data did explain some difference. Munnell et al (1996) found that after taking account of economic factors, the applicant's race still very significantly affected the probability of getting a mortgage.
Machine Learning gives computers the capability to learn without being explicitly programmed. The conditional tree (ctree) mapping just below reveals the power of Artificial Intelligence to capture nuance. In a Fintech age Artificial Intelligence may well offer a road map for tackling discrimination. Machine Learning automates and expedites analysis frequently classifying and predicting target variables where the algorithm performs the grunt work. We benefit here from Varian (2014) R code that applies the party package to the Boston HMDA dataset. The black bars organized as leaves, indicate the fraction of each group who were denied mortgages. The most glaring determinant of denial/approval is the variable “dmi,” or “denied mortgage insurance.” The tree fits pretty well, misclassifying only 228 of the 2,380 observations for an error rate of 9.6 percent. Credit Scores also are prominent throughout. The schema below was automatically generated by the Machine Learning algorithm. See also a similar analysis developed for the Titanic Passenger survival. To really understand our data however we should engage in Exploratory Data Analysis or pre-modelling. This can be expensive in terms of time and effort expended and normally we do perform that exercise before progressing fully into modeling or Machine Learning.
According to Ladd (1998) the Home Mortgage Disclosure Act (HMDA) was enacted to monitor minority and low-income access to the mortgage market. Starting in 1975, the Home Mortgage Disclosure Act (HMDA) required lenders to report information on their mortgage lending by Census tract. The data collected in 1990 for this purpose show that minorities are more than twice as likely to be denied a mortgage as whites. Yet variables correlated with both race and creditworthiness were omitted from these data, making any conclusion about race's role in mortgage lending impossible. The Federal Reserve Bank of Boston collected additional variables important to the mortgage lending decision and found that race continued to play an important, though a significantly diminished, role in the decision to grant a mortgage. To supplement HMDA data, Munnell, Tootell, Browne, and McEneaney (1996) at the Boston Fed sought the cooperation of lenders throughout the Boston metropolitan area. They examined 1990 loan applications from minorities in the Boston area, plus a random sample of applications from whites. For each application, the researchers asked lenders to provide an additional set of 38 pieces of information. The study was originally circulated in 1992, then revised in response to some of the early criticisms and published in the March 1996 issue of the American Economic Review (Munnell, Tootell, Browne, and McEneaney (1996)). For some insight into the urban demographics - see below distribution of population in the Boston Metropolitan Area based on "Whiteness". Please follow link to to an earlier 1992 draft of the Munnell, Tootell, Browne and McEneaney which was initially heavily challenged in the literature. The Munnell, et al (1992) earlier draft had been influential and did precipitate many financial institutions to review their lending practices and supervisory agencies to alter their examination procedures. The study attracted severe criticism from outside and inside the Federal Reserve, with critics claiming that key variables have been omitted, that the model was mis-specified, errors were made in the data, and information relating to differences among ethnicities in foreclosures was ignored. The 1996 draft published in the American Economic Review partially was developed to refute earlier criticism. Below, we engage in a deep dive using the Tidyverse R suite and Python to craft rich visualizations, pivot tables and date summaries. We will set up the analysis in Google Colab and apply mainly R and some Python code. You can download the HMDA data set from the Ecdat package available on the CRAN repository or otherwise download the same dataset using this hyperlink. We trawl through the numbers here to get a sense and feel for metrics. This useful before leaning into the Machine Learning Techniques and completely necessary if we really to understand the output from the Machine Learning Algorithms.
Please see also RStudio cloud implementation.