Overview of Datasets:
OECD Income Inequality Dataset (SWIID) - United States:
Source: Standardized World Income Inequality Database (SWIID)
Link: https://data.oecd.org/inequality/income-inequality.htm
Raw Data: https://drive.google.com/file/d/1UHu6hxO28nXmpJtALa-WguU2Mq39kx9v/view?usp=share_link
Content: This dataset provides measures of income inequality. It includes:
gini_disp: Gini coefficient for disposable income, which measures income inequality after taxes and transfers.
gini_disp_se: Standard error for the disposable income Gini coefficient.
gini_mkt: Gini coefficient for market income, which measures income inequality before taxes and transfers.
gini_mkt_se: Standard error for the market income Gini coefficient.
Temporal Coverage: The dataset spans several decades, offering a longitudinal view of income inequality trends in the US.
S&P 500 Data from Yahoo Finance:
Source: Yahoo Finance API
Content: Historical data on the S&P 500 index, including:
Date: The date of the trading day.
Close: The closing price of the S&P 500 index.
Other variables such as opening price, highest price, lowest price, and trading volume are also included but not used.
Temporal Coverage: The dataset provides a comprehensive view of the S&P 500 index performance over time, reflecting broader trends in the US stock market.
Number in Poverty Rate Using the Official Poverty Measure: 1959-2022
Source: United States Census Bureau
Content: Historical data on the amount of Americans living in Poverty and the percentage of Americans living in Poverty
Number in Millions: Number of Americans living in Poverty
Percentage: Percentage of Americans living in Poverty which compose the Poverty Rate
Temporal Coverage: The survey was conducted from 1959-2022 and was published in 2024.
Median Household Income in the United States from 1970-2020, by income tier ( in U.S. dollars)
Source: Statisa Research Department
Content: This data defines how many dollars it takes to reach a certain financial class, such as, upper, middle, and, lower class. It showcases the specific amount of income needed to be qualified as such.
Temporal Coverage: This data was collected from 1970-2020 and was released in 2022.
Focus on US Data:
For this analysis, we will focus on the United States data to examine the relationship between stock market performance (as measured by the S&P 500 index) and income inequality (as measured by the Gini coefficients provided in the SWIID dataset).
We are focusing on the US because it has one of the largest and most influential stock markets in the world, and understanding the dynamics within this market can provide insights into how financial markets impact economic inequality. Additionally, the US has well documented data on both stock market performance and income inequality, making it a suitable case study for this analysis
Insights Provided by the Data:
OECD Income Inequality Dataset - United States:
Income Inequality Measures: The dataset includes both market and disposable income Gini coefficients, providing insights into the effects of fiscal policies on income distribution.
Comparative Analysis: By focusing on US-specific data, we can perform a detailed examination of how stock market trends correlate with changes in income inequality over time.
Temporal Analysis: The dataset’s extensive temporal coverage allows for a detailed analysis of income inequality trends in the US.
S&P 500 Data:
Market Performance: Offers a detailed view of the S&P 500 index performance over time, reflecting broader trends in the US stock market.
Economic Indicators: The stock market data can serve as an economic indicator, often correlating with overall economic performance and investor sentiment.
Number in Poverty Rate Using the Official Poverty Measure: 1959-2022
Americans living in Poverty: The number of Americans living in poverty has been steadily rising since the 1970's while the trend isn't linear because there are periods of small downturns, overall it is still increasing
Percentage of Americans living in Poverty: This is a much steadier line that shows the percentage of American citizens living in poverty.
Median Household Income in the United States from 1970-2020, by income tier ( in U.S. dollars)
Moving the goal post: This data showcases how the amount of income needed to qualify as middle and upper class is constantly increasing. From 1970, a upper class income was $118,617, in 2020, it's $219,572 which is approximately a 85% increase 50 years. Middle class income from 1970 to 2020 was a 65% increase.
Limitations and Challenges:
OECD Income Inequality Dataset - United States:
Data Consistency: Variations in data collection methods and definitions over time may affect comparability.
Standard Errors: The inclusion of standard errors for the Gini coefficients highlights potential uncertainties in the measurements.
Income vs. Consumption: The dataset focuses on income-based measures, which may differ from consumption-based measures of inequality.
S&P 500 Data:
Market Specificity: The data is specific to the US stock market, which may not fully capture all aspects of economic inequality.
Volatility: Stock market data is highly volatile and influenced by numerous factors, including political events, economic policies, and investor behavior, complicating the analysis.
Economic Representation: Assuming the S&P 500 index as a proxy for overall economic health may not fully capture the spectrum of economic activities and disparities.
Number in Poverty Rate Using the Official Poverty Measure: 1959-2022
Poverty Rate: One limitation for this data is that it doesn't show how much the population in the U.S. increased. While the U.S. American population has been steadily increasing, the data doesn't account for people who didn't legally migrate to the U.S. but have stayed here
Median Household Income in the United States from 1970-2020, by income tier ( in U.S. dollars)
Pay To Access: One limitation from this dataset is that if we wanted to access more than just the base information we would have to pay to access it
Population Size: The population size for this dataset was 75,000. While this is big for a survey, the overall U.S. population is closer to the hundreds of millions. Therefore, there is a small chance the data is skewed.
Assumptions and Biases:
OECD Income Inequality Dataset - United States:
Data Harmonization: Standardizing data over time may introduce biases or overlook local economic nuances.
S&P 500 Data:
Economic Representation: The S&P 500 index is often used as a proxy for overall economic health, but it may not fully capture all economic activities and disparities.
Number in Poverty Rate Using the Official Poverty Measure: 1959-2022
Common Survey Problems: Since this was conducted as a survey, it becomes difficult to analyze the integrity of the data. It is assumed the United States Census Bureau conducted the survey correctly by randomizing everything. However, the Americans who filled out the survey could have easily lied on the survey.
Median Household Income in the United States from 1970-2020, by income tier ( in U.S. dollars)
Inflation: The Statista research department mentions nothing about inflation in their graph so the assumption is that the dollars have been changed to the adjust values.
Survey Bias: The Americans who completed the survey are all labeled as 18 years and older but there's no way to distinguish between the incomes across the ages
Data Selection and Cleaning Process
Selection:
OECD Income Inequality Dataset (SWIID): This preexisting dataset was selected due to its comprehensive coverage of income inequality measures across multiple decades, specifically for the United States.
S&P 500 Data: This data was obtained using the Yahoo Finance API, and it provides reliable and consistent historical stock market data.
Number in Poverty Rate Using the Official Poverty Measure: 1959-2022: This data was collected over a period of 6 decades. The method of collection was done by survey.
Median Household Income in the United States from 1970-2020, by income tier ( in U.S. dollars): The data was obtained from 75,000 people over 50 years. The method of collection was survey.
Cleaning Process:
OECD Income Inequality Data:
Filtered to include only US-specific data.
Focused on the relevant columns: 'gini_disp', 'gini_disp_se', 'gini_mk't, 'gini_mkt_se', and 'year'.
S&P 500 Data:
Downloaded historical data from Yahoo Finance.
Aggregated daily closing prices to yearly averages to match the temporal resolution of the Gini coefficients.
Selected the 'Date' and 'Close' columns and extracted the 'year' from the 'Date'.
Number in Poverty Rate Using the Official Poverty Measure: 1959-2022
The data was already cleaned by the United States Census Bureau (USCB)
The USCB has a long and strict methodology for evaluating and cleaning data so that it is of high quality.
Median Household Income in the United States from 1970-2020, by income tier ( in U.S. dollars)
This data was cleaned by the Statista Research Department. Statista is an accredited organization that focuses on collecting information.
Integration:
Merged the OECD and S&P 500 datasets on the 'year' variable to create an integrated dataset for analysis.
Poverty Rate and Median Household income were used in the Data Narrative to showcase the widening of financial classes.
By focusing on the United States data from the OECD Income Inequality Dataset and the S&P 500 historical data, this analysis aims to understand the relationship between stock market performance and income inequality. The Gini coefficients for market and disposable income served as key indicators, providing an idea of how financial markets impact income distribution. While the datasets provide valuable insights, it is crucial to account for their limitations and potential biases to ensure accurate and reliable conclusions. This US-specific analysis helps contribute to a deeper understanding of the interplay between stock market trends and economic inequality, offering insights that can help shape economic policy and financial market strategies. By examining the world's largest and most influential stock market, we can gain valuable perspectives on the broader implications of financial market performance on economic inequality, which may be relevant to other economies with similar characteristics.
Similarly, by looking at analyzing at the rate of poverty by numbers and percentage of American citizens as well as, the classifications for financial classes. It's apparent to see the U.S. economy hasn't ever correctly adjust since the 1970's and that the classifications for financial classes is widening at an alarming rate. However, the biggest issue between both of these surveys is that they lack the inclusion of people who don't have American citizenships. As such, an entire demographic may be excluded from the survey which in turn, could cause the survey to be skewed. The data for these surveys was collected, cleaned, and processed by the United States Census Bureau and the Statsita Research department. These two datasets in particular served as a foundation to help validate our thesis that we aren't too far off where the classes become divided, similar to Karl Marx's Das Kapital.