Female Labor Participation

Emalee Andrews, Tori Sierant, and Shawna Shipley-Gates

Spring 2021 PSYC 500 Final Project

Picture Source: https://www.pngjoy.com/preview/h6i7e4b4p1m2v3_women-day-vector-graphics-png-download/

Section I: Data Curation and Ethics

Background

  • Subverting traditional gender roles where women are encouraged to work outside of the home is an important issue not only in the United States, but around the world.

  • One way to assess women’s equality in the workforce in a particular country is to evaluate its female labor participation.

  • Ideally, a country’s income would benefit from female labor participation, especially over time.

  • As an all-female data science team who plans to join the workforce upon graduation, we aim to analyze the relationship between female labor participation, income, and year.

Source of Data

The source of data that Team 3 selected is based on female labor force participation rates by national per capita income from 1990 to 2017 provided by Our World in Data. Our all-female team who plan on entering the work force upon graduation are interested in the relationship between a country's female labor force participation and its national income.

Data Source Link

Ethics

Ethical Issue #1: The first ethical issue is to ensure that the dataset is supplied by a reputable source that holds a license and provides valid open access data to the public. This is why we selected a data set from Our World in Data.


Ethical Issue #2: The second ethical issue is to confirm that sensitive data about individuals or groups of people are not included in the open access data. It is vital to provide open access datasets that don't include sensitive nor identifiable data.


Ethical Issue #3: The third ethical issue is to ensure that the actual data within the datasets can not be altered by open access users. While users may reshape and combine datasets, it is unethical to change any of the actual data. All open access datasets should forbid public users from changing any obtained data without the permission of the dataset owners.

Section II: Data Preparation

Description of Variables

Entity*

Country/Continent/Province of Record


Female Labor Participation Rate (Continuous)

Labor force is the measurement of working population divided by whole. Female specific labor participation rate is the labor force made up by women ages 15 and older. This number is the female labor force- where female is divided by entire working population, divided by country's population, then multiplied by 100.


GDP per Capita (Continuous)

GDP Per Capita, stands for Gross Domestic Product per person. This refers to total value of products and services in a country during the year. GDP is then divided by the countries population. Higher rates show a better standard of living, indicating a wealthier country.


Year (Discrete)

Year of record (1997 and 2017)

*Disclaimer: Even though the entity variable was included in the dataframes, we did not use this variable in any analyses.

Size and Shape of Dataframes

Our data science team reshaped the data and created 3 different dataframes: original dataframe (df), 1997 dataframe (df_1997), and 2017 dataframe (df_2017).

Section III: Exploratory Data Analysis

Distribution of Variables

Rationale: This analysis was selected to visualize the distribution of each variable.

1997

2017

Interpretation: The female labor participation rate is a bit more skewed to the right in 1997 than in 2017. In 1997 you had a lower amount of countries falling on the median. In 1997 for GDP per capita, the graph is extremely skewed right, compared to 2017. Alongside that in 2017 there was a higher chance of being more evenly distributed.

Descriptive Statistics of Variables

Rationale: This analysis was selected to explore the descriptive statistics that is associated with each variable.

  • Medians

    • Female Labor Rate

      • 1997-49.9799995422363

      • 2017-52.536992269611055

    • GDP per Capita

      • 1997-6620.818906216479

      • 2017-12525.090001703851

  • Ranges

    • Female Labor Rate

      • 1997- 78.281003

      • 2017- 72.076004

    • GDP per Capita

      • 1997- 104333.60562

      • 2017- 116274.360392

Interpretation:

For female labor participation rate, the mean from 1997 to 2017 shows about a two percent increase, making most workers in the world in 2017 women. 3 more countries from 1997 to 2017 reported in on their numbers of female workers. Standard deviations of 1997 to 2017 shows a smaller variation in means. Minimum rate showed a 5 percent increase, meaning it is more common for women to work then previously. Max rate did fall; however, this was on a .1 measurement, and the percentage is already high, so it is hard to make any suggestions towards this value. Median rate increased by roughly 2 percent, making the majority of workers female in every country. Range did decrease in 2017, suggesting a value in each country closer to the median.


For GDP per capita, we found that from 1997 to 2017 the mean increased, showing there was an overall increase in goods and services for each country. From 1997 to 2017, 3 more countries reported their female labor participation. Standard deviation increased from 1997 to 2017, however, an increase in standard deviation means how lightly to differ from the mean. The minimum GDP per capita increased in 2017. There was a substantial increase from 1997 to 2017 in max GDP per capita. The median GDP per capita almost doubled from 1997 to 2017. The range from 1997 to 2017 showed a larger difference from the median.

Research Question

What is the relationship between female labor participation rates, GDP per capita, and year?

Hypotheses

Hypothesis #1: The probability distribution of female labor participation rates in 2017 is normally distributed.


Hypothesis #2: There is a linear relationship between female labor participation rates and national GDP per capita in 2017.


Hypothesis #3: The mean female participation rates in 1997 and 2017 are not identical.

Needed Functions

CDF

Difference of Means

A Bootstrap Replicate

Draw Bootstrap Replicates

Draw Pairs Bootstrap

Section IV: Model Building/Validation

Normality of the Probability Distribution (Female Labor Participation)

Rationale: This analysis is being conducted to test if the probability distribution of female labor participation rates in 2017 is normally distributed.

Interpretation: The theoretical and empirical CDFs do not match. Therefore, the probability distribution of female labor participation rates in 2017 is not normally distributed, with a mean of 51.69% and standard deviation of 14.44%.

Linear Regression Analysis (Female Labor Participation & GDP per Capita)

Rationale: This analysis was selected to explore the relationship between female labor participation rate and GDP per capita in 2017.

Interpretation:

Intercept: When there is zero female labor participation, there is a GDP per capita of 18,450.


Slope: GDP per capita is expected to increase by .06 units per 1 unit of female labor participation.


This relationship was not statistically significant, based on the slope parameter 95% CI [-142.60, 154.19]. If a 95% confidence interval includes zero, the null hypothesis of no linear relationship between female labor participation and GDP per capita is not rejected.

Bootstrap Hypothesis Testing (Year and Female Participation Rate)

Rationale: This analysis was selected to test if the mean female participation rates in 1997 and 2017 are identical.

Step1. EDA -- plot cdfs of 1997 and 2017

Step 2: Choose a test statistic - "difference of means"

Step 3: State H0

H0: The mean female labor participation rates in 1997 and 2017 are identical.

Step 4: Simulate data assuming H0 is true - a bootstrap sample

Step 5-6: calculate a replicate from a simulated dataset - a bootstrap replicate & repeat Step 4 & 5 - draw_bootstrap_replicates()

Step 7. Decision

Interpretation: The p-value tells us that there is about a 0.00% chance that you would get the difference of means observed in the experiment if female labor participation rates in 1997 & 2017 were identical. Indeed, none of the bootstrap replicates are equal or more extreme than the observed value (empirical mean diff = 2.17%). The null hypothesis that the mean of female labor participation rates for 1997 and 2017 are identical can be rejected.

Section V: Discussion

Summary

The purpose of the current report is to explore the relationshp between female labor participation rate, GDP per capita, and year. Data was curated from the Our World in Data website. The source of data that our team selected is based on female labor force participation rates by national per capita income from 1990 to 2017. Female Labor Participation Rates were measured by the labor force made up by females ages 15 and older. GDP per capita was measured by the total value of products and services in a country during the year and then divided by the country's population. Year is measured by year on record. Data was reshaped to only include the years 1997 and 2017, entity, female labor participation rate, and GDP per capita.


A linear regression analysis was used to test if the female labor participation rate predicts the GDP per capita in 2017. The pairs bootstrap was conducted to generate a confidence interval for the slopes of the linear regression model (with 10,000 resamples using Python). The estimated intercept parameter was 18,450 which indicates that when there is zero female labor participation, there is a GDP per capita of 18,450. The estimated slope parameter was 0.06 which indicates that GDP per capita is expected to increase by .06 units per 1 unit of female labor participation. This relationship was not statistically significant, based on the slope parameter 95% CI [-142.60, 154.19]. If a 95% confidence interval includes zero, the null hypothesis of no linear relationship between female labor participation and GDP per capita is not rejected.


The bootstrap hypothesis testing was conducted to test whether the mean female labor participation rates in 1997 and 2017 were identical. The p-value tells us that there is about a 0.00% chance that you would get the difference of means observed in the experiment if female labor participation rates in 1997 & 2017 were identical. Indeed, none of the bootstrap replicates are equal or more extreme than the observed value (empirical mean diff = 2.17%). The null hypothesis that the mean of female labor participation rates for 1997 and 2017 are identical can be rejected.

Limitations

Limitation #1: Some of the countries had missing data which makes it impossible to ensure that every single country is represented in the dataframe. Our World in Data should make sure that every country is able to measure their respective GDP per capita and female labor participation rates.

Limitation #2: Data was only obtained from 1990 to 2017 which didn't allow the opportunity to conduct bootstrap hypothesis testing on a larger year gap.

Limitation #3: Female labor participation rates were only included if women were at least 15 years of age. This excludes countries and entities that allow young girls to enter the workforce younger than 15 years of age.

Implications/Recommendations

Due to our lack of statistically significant findings regarding the relationship between female labor participation and GDP per capita our data science team is unable to provide solid implications nor policy and community intervention recommendations at this time. Instead, we recommend that further research is conducted to investigate the relationship between female labor participation and GDP per capita by focusing on inequality indexes. For example, future data teams could combine this project's Our World in Data information with existing data focused on inequality, particularly in the labor force. This might reflect a better relationship between female labor force inequality and GDP per capita.

On the other hand, our data science team discovered that there is a statistically significant difference between female labor force participation in 1997 and 2017. This implies that time has an influence on female labor participation. Based on these findings, we have 3 recommendations:

Recommendation #1: Regardless, we recommend that community interventions and policies achieve equality in the workforce by focusing on female labor participation. Specifically, we recommend further research be conducted to find out which types of labor are the most popular among women. As time goes by, labor type popularity might change, so it would be in the best interest of policy makers and community interventions to follow labor trends. Based on these trends, there should be systems in place to prepare women for that type of labor and increase female participation. Additionally, it would be worth investing in systems that encourage female labor participation in areas that are not as popular to reach equality in all types of labor.

Recommendation #2: Furthermore, these policies and community interventions need to be altered depending on the particular country due to sociocultural, economical, and political differences that impact female labor participation such as traditional gender roles, age of first job, gender-based violence, child labor laws, governmental changes, poverty levels etc.

Recommendation #3: Lastly, we recommend that further research is conducted to evaluate major events that took place during the time of interest that could have contributed to the difference in female labor participation to predict similar trends in the future including wars, economic depressions, pandemics, women empowerment movements, etc.