Women's Health

Logan Devlin, Lindsey Hohn, Sophia Peterson, and Paige Ritter

PSYC 500 Final Project

PART I: DATA CURATION AND ETHICS

Background

The National Women’s Health Network is a not-for-profit organization based in Washington D.C. made up of activists working to change policy and raise money for women’s health issues.

They have asked us to analyze data from countries across the world for different indicators of maternal health in order to determine what they should focus on advocating for in order to yield better women’s health globally.


Due to issues of gender inequality as well as increased medical needs for females from reproductive and maternal health, women’s health requires more attention and focus in order to achieve health equality which is required in order for women to achieve economic, political, and social equality.

Source of Data

Each datasheet curated was found and sourced through Gapminder which is an independent foundation with no political, religious, or economic affiliations based in Sweden that is working to identify systematic misconceptions about global trends and producing accessible and understandable material to represent global data trends. They source all of their data, mostly from international organizations such as the World Health Organization, the World Bank, etc.

Data Curation and Ethics

Many of the rules regarding assumptions and perspectives when analyzing general data. Important ethical considerations when analyzing global data are that of globalization and the impacts different countries have had on others socio-economic and political realities today. Countries like the United States and the United Kingdom have politically and economically imperialized many countries in the past 100 years that significantly changed the trajectory of these counties in terms of wealth, resources, politics, and many more, which all contribute to maternal health and most be considered when evaluating countries.


Much of Global data has the same primary sources, like the UN, World Economic Institute, WHO, etc. While this helps with consistency in regards to counting criteria, it also poses ethical concerns as these common primary sources are majority funded and run by individuals from more developed and higher economic countries. This introduces systematic bias throughout a majority of global data that should always be considered during data analyses.


Solutions for eliminating some of these bias is by using the same primary sources or multiple that are held to the same criteria and standard. Focusing on individual countries can help focus in on specific bias to eliminate or factor in or creating discrete variables which can generalize global patterns which can eliminate disparities in data collection for individual countries.

PART II: DATA PREPARATION

Variable Description and Reasoning

Maternal Mortality Ratio

The discrete variable for each country describes the Maternal Mortality Ratio according to the Institute for Health Metrics and Education. The ratio is the number of maternal deaths per 100,000 live births. Maternal deaths is considered any death of a woman while pregnant or within a year of pregnancy, regardless of duration and site, from any cause related to or exacerbated by pregnancy or its management. It does not include accidental or incidental causes in its count for women ages 15-49. The WHO considers 11 indicators of maternal, newborn, and child health. One of the most accepted indicators for overall women's health is the MMR. In order to determine what factors contribute the most to women's health within a country, MMR's of each country in the world were divided up into 4 different ranges, low, medium, high, extreme, with the low level having the lowest MMRs and therefore correlated to greater women's health, in order to analyze what variables have the biggest positive impact on MMRs. The low range includes MMR's between 0-50. The medium range includes MMR's between 50-100. The high range includes MMR's between 100-200. The extreme range includes MMR's between 200-600. These ranges were decided based off of Gapminder's visual division of the MMR (IHME) dataset in its frequency/count graph.

GINI coefficient

The variable for each country that is an indicator for income inequality in the Gini coefficient. The formula developed by an Italian statistician named Gini = A/(A+B) with A representing the area above the Lorenz curve and B representing the area below the Lorenz curve. It is the measure of distribution of income across a population. The Gini coefficient is commonly used as a gauge of economic inequality in a country, with the higher the Gini coefficient meaning a higher level of economic inequality. By using the Gini coefficient as a variable for each country, relative economic inequality within a country and its relationship to women's health can be analyzed.

Universal Healthcare System Index

The WHO and Global Health Observatory combined data for 16 tracer indicators to establish and index for rating and understanding the differences amongst different countries and their health system. The percentage of expenditures demonstrates the level of universal health care present in each country and compare the level of UHC to MMR.

CPIA Gender Equality Rating

The gender equality rating assess the extent to which a country has installed institutions and programs to enforce laws and politics that promote equal access for men and women in education, health, economics, and protection under the law. The data is sourced through the Certified Professional Insurance agent collected by the world bank. The gender equality rating allows country's to evaluate the impacts on MMR with gender equality.

Variable Info

Variable Types

Variable Shapes

Variable Sizes

PART III: EXPLORATORY DATA ANALYSIS

Distribution Histograms of Each Variable

Through the Maternal Mortality Ratio histogram, we can see that the mean MMR is 116.63, which falls in the Medium level of MMR. This histogram is also very positively skewed, indicating that the majority of countries have a low MMR. Higher MMR indicates more maternal mortalities.

Through the Universal Healthcare Coverage histogram, we can see that the mean UHC is 62.6. This histogram is slightly negatively skewed, indicating that the majority of countries have a higher UHC. Higher UHC numbers indicate better healthcare coverage.

Through the Income Equality Index histogram, we can see that the mean index is 38.93. This histogram is slightly positively skewed, indicating that the majority of countries have low Income Equality. This, however, is a good thing, as zero represents perfect equality and higher numbers represent inequality.

Through the Gender Equality Ratio histogram, we can see that the mean MMR is 3.31. This histogram is slightly negatively skewed, indicating that the majority of countries have a higher Gender Equality Ratio. Higher Gender Equality Ratios indicate better gender equality. Also to be noted: the Gender Equality Ratio is recorded in specific values: 0, .5, 1, 1.5, 2, 2.5...4.5. This is why the histogram has bins that are centered at those numbers.

Descriptive Statistics of Each Variable

Maternal Mortality Ratio

MEAN = 116.6 MEDIAN = 60.4 MAX = 561.0 MIN = 1.3

GINI Income Inequality Index

MEAN = 38.9 MEDIAN = 39.1 MAX = 63.1 MIN = 24.8

Gender Equality Ratio

MEAN = 3.3 MEDIAN = 3.5 MAX = 4.5 MIN = 1.5

Universal Healthcare Coverage

MEAN = 62.6 MEDIAN = 67.0 MAX = 88 MIN = 22

PART IV: STATISTICAL MODELING

Scatter Plots of Variables with Maternal Mortality Ratio

MMR x Gender Equality Ratio Scatterplot

A scatterplot was created with the two continuous variables, maternal mortality ratio as the x-variable and gender equality ratio as the y-variable. The data for the x-variable was drawn from the 2015 Maternal Mortality Ratio dataframe. The data for the y-variable was drawn from the 2015 Gender Equality Ratio dataframe. This data is continuous but has greater generalization and shows characteristics similar to a discrete variable in its visual representation. This visual exploratory analysis demonstrated that the gender equality ratio would not be a good option to explore further for linear regression analysis.

MMR x Universal Healthcare Coverage Scatterplot

A scatterplot was created with the two continuous variables, maternal mortality ratio as the x-variable and universal healthcare coverage as the y-variable. The data for the x-variable was drawn from the 2015 Maternal Mortality Ratio dataframe. The data for the y-variable was drawn from the 2015 Universal Healthcare Coverage dataframe. There is a plot cluster between 0 < x > 100 MMR values and 60 < y > 90 UHC values and general negative linear trend. This visual exploratory analysis demonstrated that the universal healthcare coverage would be a good option to explore further for linear regression analysis.

MMR x Income Equality Index Scatterplot

A scatterplot was created with the two continuous variables, maternal mortality ratio as the x-variable and income equality ratio as the y-variable. The data for the x-variable was drawn from the 2015 Maternal Mortality Ratio dataframe. The data for the y-variable was drawn from the 2015 Income Equality Index dataframe. There is a plot cluster between 0 < x > 100 MMR values and >30 < y > 40 INI values. This visual exploratory analysis demonstrated that the income inequality index would not be a good option to explore further for linear regression analysis.

Linear Regression Analysis of Maternal Mortality Ratio and Universal Healthcare Coverage


Linear Regression Line

Using slope and y-intercept of a least squares polynomial fit function which accepts the data set and a polynomial function of any degree, a negative regression line was returned that minimized the squared error.

Intercept: The average near 74 for UHC the model predicts for maternal mortality ratio of zero.

Slope: The UHC of a country is expected to decrease by 0.1 (rounded) on average per 1 unit increase the maternal mortality ratio. A decrease by 10 for UHC predicts a roughly 100 increase in the maternal mortality ratio value.This demonstrated a possible negative linear relationship between UHC and MMR.


Sum of Square of Residuals

How optimal is a parameter estimate? How can we figure out which slope and intercept can best match the empirical data? A residual of a data point is the vertical distance between the data point and the regression line. Least Squares is the process of finding the parameters for which the sum of the squares of the residuals is minimal.If the least squares is small, the regression line fits the empirical data well. The minimum on the plot, the value of the slope (-0.099) gives the minimum sum of the square of the residuals, is the same value as the slope when performing the regression. This least squares in small demonstrating that the empirical data fits well.


Normality of the UHC Probability Distribution

From the Cumulative Distribution Function on the left, we can see that the 2015 Universal Healthcare (orange) is relatively normally distributed when compared to the theoretical distribution (blue). The UHC of the data provided is capped at 88, which is why the UHC CDF does not continue past 88 along with the theoretical CDF.

Permutation Hypothesis Testing

To visualize the data we have formed a swarm plot. From this swarm plot, we could see that the low level of Maternal Mortality Ratio has the highest number of 2015 Universal Healthcare Coverage and looks to decreases as the MMR level increases.

Calculate Summary Statistics

The low MMR level has a mean of about 73.7 of 2015 Universal Healthcare Coverage, medium level 69.6, high level 55.9, and extreme level 41. This shows a numerical decrease in 2015 Universal Healthcare Coverage as the MMR level increases. The difference between the low and extreme MMR level was then calculated to use for simulation.

H0: The distributions between the different MMR levels are identical.

Simulate data assuming the H0 is true.

Given the differene of means, figure if it would be possible that the observed difference was by chance.

The permutation hypothesis test yielded a p-value of 1.0. The null hypothesis cannont be rejected.

Bootstrap Hypothesis Testing

First we needed to plot the CDF for each MMR level to see the difference between the levels and can conclude that none of them are equal to each other.

The summary statistics were also found to then use to determine the "difference of means", which is used to simulate the H0.

H0: The mean Universal Heatlthcare Coverage is identical for all MMR levels.

We then simulate the data assuming that the H0 is true. The data marked with an 'x' is the simulated data and you can see that they are closer together and are closer to making the H0 true.

The bootstrap hypothesis test yielded a p-value of 0.00. The null hypothesis can be rejected.

DISCUSSION

Future Directions:

The current null hypothesis for a permutation hypothesis test is that the continuous variables, Universal Healthcare Index, Gini coefficient, and CPIA gender equality rating, have no statistical significance on the discrete variable, Maternal Mortality Ratio(IHME). Future variables to consider for future exploration into indicators for women's health include gender education or literacy equality rates or ratios, total healthcare spending as percent of countries GDP, and the number of women in parliament or government offices.


Implications:

Based on the statistical analysis of global women's health indicators, we encourage the National Women's Health Network to invest money and resources into promoting Universal Healthcare Coverage in countries across the globe in order to increase the quality and standard of women's health.


Limitations:

One of the limitations of the data and analysis is that not all countries represented in some datasets curated had data that could be represented in all datasets which lead to missing values for some variables. Another limitation regarding the data is there are constantly countries around the world changing their borders or no longer exist as a country found in the dataset from previous years which can lead to missing values and miss counts.

All of the data sets are sourced by the World Health Organization or the World Bank, both of which combine reported data of a variable for each country into one dataset. However, national data for each country that WHO or World Bank uses for the global data are often self-reported by the country. This means that, depending on the variable, there might be major over or under-reporting.