Human Development Index (HDI)

Aubrey Dick, Lilly Sale, Meena Sreedhara, & Addie Toon

PSYC 500 final project

Section I: Data Curation and Ethics

The data we have obtained comes from the United Nations Development Programme (UNDP) which contains the Human Development Index (HDI). This data concerns life expectancy, years of schooling, and gross national income per capita of countries around the world. These countries are then ranked by the highest numbers in all categories. We chose this data because there is a good mix of categorical and continuous variables. We wanted to see how education, life expectancy, and income contribute to the overall human development within a country. Some of this data is very personal, so when obtaining it always important to let the participants know how and why their information is being used. It is okay to try and express how collecting this data will help others in the future, but avoid using coercion. Some of the data used in this file is already public access (years lived, perhaps education years). If the information is not public access you must ask permission before acquiring it.

Section II: Data Preparation

Read data and assign data to a variable name.

Describe key aspects of each variable in data.

Determine types of variables within data, along with the shape and size of the data frame.

Section III: Exploratory Data Analysis

Human Development Index (HDI)

The minimum HDI value is for Niger

The maximum HDI value is for Norway

Life Expectancy at Birth

The minimum life expectancy is in the Central African Republic

The maximum life expectancy is in Hong Kong, China (SAR)


Expected Years of Schooling


The minimum expected years of schooling is in Eritrea.

The maximum expected years of schooling is in Australia.

Mean Years of Schooling

Mean "Mean years of schooling" = 8.763313876882975

Standard deviation "Mean years of schooling" = 3.0489565690301768

Minimum "mean years of schooling" = 1.644298557

Maximum "mean years of schooling" = 14.15168

Gross National Income (GNI) Per Capita

Mean GNI per capita = 20320.889962310644

Standard deviation GNI per capita = 21183.44340253105

Minimum GNI per capita = 753.9087475

Maximum GNI per capita = 131031.5898

Section IV: Modeling Building/Validation

Normality of Distribution

The theoretical and empirical CDFs overlap quite a bit, suggesting that the data for life expectancy at birth follows a near normal distribution.

Linear Regression

The slope is 9.604 x 10^-5, which means there are a predicted 9.604 x 10^-5 more mean years of schooling for every increase in dollar of GNI per capita. In other words, for every $10,000 increase in capita, there is predicted to be a .9604 year increase in mean years of schooling.


The intercept is 6.811, which predicts an average of 6.811 mean years of schooling when the country's GNI per capita is 0.

The p-value is 0.0, which is less than 0.05. The correlation between mean years of schooling and gross national income per capita is statistically significant and not due to random chance. The null hypothesis is rejected.

Permutation/Bootstrap Hypothesis Testing

HDI Level was the discrete variable we measured in comparison to the continuous variable of Mean Years of Schooling. We performed 3 separate permutation hypothesis tests in order to see the difference between the HDI Levels and their effect on a country's mean years of schooling. The way we organized these tests was by comparing the Very High to the High HDI Level, the High to the Medium HDI Level, the Medium to the Low HDI Level, and each of their effects on the continuous variable. Below is the code, charts, and graphs used for each of these tests and at the end is the analysis of the results.

Permutation Hypothesis Test Comparing Very High and High HDI Level

Permutation Hypothesis Test Comparing High and Medium HDI Level

Permutation Hypothesis Test Comparing Medium and Low HDI Level

Analysis of Results:

Section V: Discussion

Objective:

The data we have obtained comes from the United Nations Development Programme (UNDP) which contains the Human Development Index (HDI). This data concerns life expectancy, years of schooling, and gross national income per capita of countries around the world. These countries are then ranked by the highest numbers in all categories. We chose this data because there is a good mix of categorical and continuous variables. We wanted to see how education, life expectancy, and income contribute to the overall human development within a country. By using models like linear regression and permutation/bootstrap hypothesis testing we are able to visualize the objective of our data to help interpret it. Linear regression models will show residuals that have a constant variance, normally distributed, and are independent to one another.


Limitations/Future Direction:

One limitation of this study would be the outside factors around the variables we are studying that could be affecting the overall human development within a country. There is no factor for things like illness(mental and physical) or tragedy. Other things like access to health care, technological development within a country, major industries that could influence health, and cultural beliefs are all factors that were not taken into account for this data curation. These instances can easily skew data in certain countries. In the future, gathering this type of data could expel outliers within the distributions so that the data becomes more unified and easier to interpret. This would enhance the data so that we know everything possible was taken into account that could affect the overall human development positively or negatively.


Implications:

After obtaining this data, countries can look at what their human development index is and learn how they can increase it through the variables that were explored. If it increases when their citizens have more years of education, then they can find ways to help people get access to this education or encourage them to achieve more years in schooling. If there is a direct correlation between life expectancy and income, they can use this data to create new policies and ideas that could in turn expand normal life parameters. Once more factors are taken in when gathering all this information, they could come to conclusions like health care is not as accessible in some countries and that greatly affects the overall human development within a country. This could lead to movements for more accessible health care or better technological development.