Women's Health
Logan Devlin, Lindsey Hohn, Sophia Peterson, and Paige Ritter
PSYC 500 Final Project
Logan Devlin, Lindsey Hohn, Sophia Peterson, and Paige Ritter
PSYC 500 Final Project
PART I: DATA CURATION AND ETHICS
Background
The National Women’s Health Network is a not-for-profit organization based in Washington D.C. made up of activists working to change policy and raise money for women’s health issues.
They have asked us to analyze data from countries across the world for different indicators of maternal health in order to determine what they should focus on advocating for in order to yield better women’s health globally.
Due to issues of gender inequality as well as increased medical needs for females from reproductive and maternal health, women’s health requires more attention and focus in order to achieve health equality which is required in order for women to achieve economic, political, and social equality.
Source of Data
Each datasheet curated was found and sourced through Gapminder which is an independent foundation with no political, religious, or economic affiliations based in Sweden that is working to identify systematic misconceptions about global trends and producing accessible and understandable material to represent global data trends. They source all of their data, mostly from international organizations such as the World Health Organization, the World Bank, etc.
Data Curation and Ethics
Many of the rules regarding assumptions and perspectives when analyzing general data. Important ethical considerations when analyzing global data are that of globalization and the impacts different countries have had on others socio-economic and political realities today. Countries like the United States and the United Kingdom have politically and economically imperialized many countries in the past 100 years that significantly changed the trajectory of these counties in terms of wealth, resources, politics, and many more, which all contribute to maternal health and most be considered when evaluating countries.
Much of Global data has the same primary sources, like the UN, World Economic Institute, WHO, etc. While this helps with consistency in regards to counting criteria, it also poses ethical concerns as these common primary sources are majority funded and run by individuals from more developed and higher economic countries. This introduces systematic bias throughout a majority of global data that should always be considered during data analyses.
Solutions for eliminating some of these bias is by using the same primary sources or multiple that are held to the same criteria and standard. Focusing on individual countries can help focus in on specific bias to eliminate or factor in or creating discrete variables which can generalize global patterns which can eliminate disparities in data collection for individual countries.
PART II: DATA PREPARATION
Variable Description and Reasoning
Variable Info
PART III: EXPLORATORY DATA ANALYSIS
Distribution Histograms of Each Variable
Through the Maternal Mortality Ratio histogram, we can see that the mean MMR is 116.63, which falls in the Medium level of MMR. This histogram is also very positively skewed, indicating that the majority of countries have a low MMR. Higher MMR indicates more maternal mortalities.
Through the Universal Healthcare Coverage histogram, we can see that the mean UHC is 62.6. This histogram is slightly negatively skewed, indicating that the majority of countries have a higher UHC. Higher UHC numbers indicate better healthcare coverage.
Through the Income Equality Index histogram, we can see that the mean index is 38.93. This histogram is slightly positively skewed, indicating that the majority of countries have low Income Equality. This, however, is a good thing, as zero represents perfect equality and higher numbers represent inequality.
Through the Gender Equality Ratio histogram, we can see that the mean MMR is 3.31. This histogram is slightly negatively skewed, indicating that the majority of countries have a higher Gender Equality Ratio. Higher Gender Equality Ratios indicate better gender equality. Also to be noted: the Gender Equality Ratio is recorded in specific values: 0, .5, 1, 1.5, 2, 2.5...4.5. This is why the histogram has bins that are centered at those numbers.
Descriptive Statistics of Each Variable
Maternal Mortality Ratio
MEAN = 116.6 MEDIAN = 60.4 MAX = 561.0 MIN = 1.3
GINI Income Inequality Index
MEAN = 38.9 MEDIAN = 39.1 MAX = 63.1 MIN = 24.8
Gender Equality Ratio
MEAN = 3.3 MEDIAN = 3.5 MAX = 4.5 MIN = 1.5
Universal Healthcare Coverage
MEAN = 62.6 MEDIAN = 67.0 MAX = 88 MIN = 22
PART IV: STATISTICAL MODELING
Scatter Plots of Variables with Maternal Mortality Ratio
A scatterplot was created with the two continuous variables, maternal mortality ratio as the x-variable and gender equality ratio as the y-variable. The data for the x-variable was drawn from the 2015 Maternal Mortality Ratio dataframe. The data for the y-variable was drawn from the 2015 Gender Equality Ratio dataframe. This data is continuous but has greater generalization and shows characteristics similar to a discrete variable in its visual representation. This visual exploratory analysis demonstrated that the gender equality ratio would not be a good option to explore further for linear regression analysis.
A scatterplot was created with the two continuous variables, maternal mortality ratio as the x-variable and universal healthcare coverage as the y-variable. The data for the x-variable was drawn from the 2015 Maternal Mortality Ratio dataframe. The data for the y-variable was drawn from the 2015 Universal Healthcare Coverage dataframe. There is a plot cluster between 0 < x > 100 MMR values and 60 < y > 90 UHC values and general negative linear trend. This visual exploratory analysis demonstrated that the universal healthcare coverage would be a good option to explore further for linear regression analysis.
A scatterplot was created with the two continuous variables, maternal mortality ratio as the x-variable and income equality ratio as the y-variable. The data for the x-variable was drawn from the 2015 Maternal Mortality Ratio dataframe. The data for the y-variable was drawn from the 2015 Income Equality Index dataframe. There is a plot cluster between 0 < x > 100 MMR values and >30 < y > 40 INI values. This visual exploratory analysis demonstrated that the income inequality index would not be a good option to explore further for linear regression analysis.
Linear Regression Analysis of Maternal Mortality Ratio and Universal Healthcare Coverage
Linear Regression Line
Using slope and y-intercept of a least squares polynomial fit function which accepts the data set and a polynomial function of any degree, a negative regression line was returned that minimized the squared error.
Intercept: The average near 74 for UHC the model predicts for maternal mortality ratio of zero.
Slope: The UHC of a country is expected to decrease by 0.1 (rounded) on average per 1 unit increase the maternal mortality ratio. A decrease by 10 for UHC predicts a roughly 100 increase in the maternal mortality ratio value.This demonstrated a possible negative linear relationship between UHC and MMR.
Sum of Square of Residuals
How optimal is a parameter estimate? How can we figure out which slope and intercept can best match the empirical data? A residual of a data point is the vertical distance between the data point and the regression line. Least Squares is the process of finding the parameters for which the sum of the squares of the residuals is minimal.If the least squares is small, the regression line fits the empirical data well. The minimum on the plot, the value of the slope (-0.099) gives the minimum sum of the square of the residuals, is the same value as the slope when performing the regression. This least squares in small demonstrating that the empirical data fits well.
Normality of the UHC Probability Distribution
From the Cumulative Distribution Function on the left, we can see that the 2015 Universal Healthcare (orange) is relatively normally distributed when compared to the theoretical distribution (blue). The UHC of the data provided is capped at 88, which is why the UHC CDF does not continue past 88 along with the theoretical CDF.
Permutation Hypothesis Testing
To visualize the data we have formed a swarm plot. From this swarm plot, we could see that the low level of Maternal Mortality Ratio has the highest number of 2015 Universal Healthcare Coverage and looks to decreases as the MMR level increases.
The low MMR level has a mean of about 73.7 of 2015 Universal Healthcare Coverage, medium level 69.6, high level 55.9, and extreme level 41. This shows a numerical decrease in 2015 Universal Healthcare Coverage as the MMR level increases. The difference between the low and extreme MMR level was then calculated to use for simulation.
H0: The distributions between the different MMR levels are identical.
Simulate data assuming the H0 is true.
Given the differene of means, figure if it would be possible that the observed difference was by chance.
The permutation hypothesis test yielded a p-value of 1.0. The null hypothesis cannont be rejected.
Bootstrap Hypothesis Testing
First we needed to plot the CDF for each MMR level to see the difference between the levels and can conclude that none of them are equal to each other.
The summary statistics were also found to then use to determine the "difference of means", which is used to simulate the H0.
H0: The mean Universal Heatlthcare Coverage is identical for all MMR levels.
We then simulate the data assuming that the H0 is true. The data marked with an 'x' is the simulated data and you can see that they are closer together and are closer to making the H0 true.
The bootstrap hypothesis test yielded a p-value of 0.00. The null hypothesis can be rejected.
DISCUSSION
Future Directions:
The current null hypothesis for a permutation hypothesis test is that the continuous variables, Universal Healthcare Index, Gini coefficient, and CPIA gender equality rating, have no statistical significance on the discrete variable, Maternal Mortality Ratio(IHME). Future variables to consider for future exploration into indicators for women's health include gender education or literacy equality rates or ratios, total healthcare spending as percent of countries GDP, and the number of women in parliament or government offices.
Implications:
Based on the statistical analysis of global women's health indicators, we encourage the National Women's Health Network to invest money and resources into promoting Universal Healthcare Coverage in countries across the globe in order to increase the quality and standard of women's health.
Limitations:
One of the limitations of the data and analysis is that not all countries represented in some datasets curated had data that could be represented in all datasets which lead to missing values for some variables. Another limitation regarding the data is there are constantly countries around the world changing their borders or no longer exist as a country found in the dataset from previous years which can lead to missing values and miss counts.
All of the data sets are sourced by the World Health Organization or the World Bank, both of which combine reported data of a variable for each country into one dataset. However, national data for each country that WHO or World Bank uses for the global data are often self-reported by the country. This means that, depending on the variable, there might be major over or under-reporting.