To understand the historical context and intricate mechanisms behind credit card transactions, we turn to the video, "A Brief History of Credit Cards (or What Happens When You Swipe)" by a16z general partner Alex Rampell.
This video takes us on a journey from the inception of credit cards in the mid-20th century to their ubiquitous presence today. Starting from the early days of charge cards like Diners Club, which catered to wealthy New Yorkers, Rampell explains the evolution to mass-market credit cards pioneered by Bank of America.
Our dataset, found from Kaggle, comprises a range of demographic, socioeconomic, and employment-related attributes of individuals. The columns in the dataset include unique identifiers, gender, car and property ownership status, phone and email availability, employment status, number of children and family members, account tenure, total income, age, years of employment, income type, education level, family status, housing type, occupation, and a target variable indicating a binary outcome. Specifically, the dataset includes the following attributes:
ID: Unique identifier for each individual.
Gender: Binary representation of gender.
Own_car: Ownership status of a car.
Own_property: Ownership status of property.
Work_phone: Availability of a work phone.
Phone: Availability of a personal phone.
Email: Availability of an email.
Unemployed: Employment status.
Num_children: Number of children.
Num_family: Number of family members.
Account_length: Duration of account in months.
Total_income: Total income of the individual.
Age: Age of the individual.
Years_employed: Duration of employment in years.
Income_type: Category of income (e.g., Working, Commercial associate, Pensioner)
Education_type: Level of education attained. (e.g., Higher education, Secondary / secondary special)
Family_status: Marital status. (e.g., Civil marriage, Married, Single / not married, Separated)
Housing_type: Type of housing arrangement. (e.g., Rented apartment, House / apartment)
Occupation_type: Type of occupation. (e.g., Other, Security staff, Sales staff, Accountants)
Target: Binary outcome variable, is the person eligible or not?
Information Illuminated by the Dataset:
This dataset allows researchers to compare the impact on credit card eligibility of various economic factors such as income, employment status, and property ownership, with social factors such as family status, education type, and number of children. This can ultimately reveal to researchers which socioeconomic factors are the most important for credit card applicants, which is valuable information that can later be used to help educate the general public. Additionally, researchers can look at specific correlations between credit card eligibility and individual factors such as total income, employment status, age, education, credit history, car/property ownership, and family/housing situations.
Limitations of the Dataset:
While the dataset provides valuable insights, it also has several limitations:
Data Granularity
Some attributes, such as income type, occupation type, and education level are categorized broadly. More granular data would provide deeper insights and more accurate analyses.
Temporal Context
The dataset does not include a timestamp or date range for each unique ID, which limits the ability to analyze trends over time or understand the temporal dynamics of the observed phenomena.
Missing Contextual Data
Certain socioeconomic factors, such as education quality, health status, or geographic location, are not included. These factors can significantly influence the observed variables and outcomes.
Cleaning:
One of the advantages of choosing a dataset from Kaggle was that the data was all clean and ready to go. However, at times, we had to put categorical data into broad groups for the purposes of analyzing a general trend. For example, while creating the pie chart indicating the correlation between education and card eligibility, we included categories such as incomplete higher education and "Academic degree" into "Secondary education or below". It wasn't clear to us what is meant by "Academic degree", but it was an outlier type. Additionally, while conducting our own analysis on marriage status and house ownership, we discarded entries of "Widowed" and "Separated", as it is unclear to us how (or if) banks would group these together with the married and single categories.
Assumptions:
Income as a Proxy for Creditworthiness
The assumption that higher income directly correlates with better creditworthiness might overlook other crucial factors such as spending habits, existing debts, and financial responsibilities.
Employment Stability
The dataset might assume that longer employment duration equates to financial stability. However, it does not account for the quality or security of the job, which can significantly impact financial behavior and creditworthiness.
Family Size and Financial Burden
There is an inherent assumption that larger family size increases financial burden, affecting credit card eligibility. This might not always hold true, as family financial dynamics vary widely.
Property and Car Ownership
The dataset assumes that ownership of property or a car signifies financial stability and creditworthiness. However, these assets could also indicate financial liabilities, depending on the individual's overall financial situation.
Biases:
Sampling
If the dataset relies on self-reported information, there could be inaccuracies due to respondents' tendencies to misreport or omit information. This can especially impact variables like income and employment status. Moreover, if the data was collected from a specific region or demographic group, the findings may not be generalizable to a broader population.