Education and Human Development Index (HDI)
Hannah Flickner, Riley Ricket, Akshat Gupta, Abeer Iqbal, Antonio Soares De Almeida
Hannah Flickner, Riley Ricket, Akshat Gupta, Abeer Iqbal, Antonio Soares De Almeida
For our final project, our data analysis team has worked on analyzing data from the United Nations Human Development Programme. We curated datasets regarding education, government spending on education, literacy rates, and the Human Development Index for 200 countries over the course of 18 years (2000-2018). We curated these datasets because we wanted to look into how government expenditure correlates to mean years of schooling, literacy rates, and how the HDI relates to education. We are interested in education and trends in education because it is important to understand how education differs across the globe to better understand the dimensions of human development.
According to the UN, the Human Development Index is a summary measure of average achievement in key dimensions of human development. In simpler terms, HDI is an important indicator of a country’s progress. Our team aims to better understand how the HDI Rank is a good measure in relation to education, specifically regarding literacy rates across the globe. Literacy rates are extremely important to look at because literacy is a critical indicator of a country’s progress. Literacy is essential to economic development as well as community and individual well-being. Our primary research question was whether the HDI Rank was a good measure and to investigate this we performed a bootstrap hypothesis test to determine whether the medium and low HDI ranked countries have similar or different literacy rates. Our hypothesis was that low and mediumly HDI ranked countries will have a significant difference in literacy rates. In addition, we also investigated whether government expenditure correlated to a higher literacy rate.
If you're interested in looking at all of our data work you can follow this link to our Google Colab:
Mean years of schooling (years).csv
Denotes the data for the mean years of schooling for a variety of countries over the span of 18 years
Government expenditure on education (% of GDP).csv
Displays how much % of GDP each country spends on education.
Liteacy rate, adult (% ages 15 and older).csv
Gives the literacy rate % for the population that is aged 15 and older in each country
Human Development Index (HDI).csv
Displays the HDI Rank for each country over the course of 18 years.
The data that we used in our project was obtained from the United Nations Human Development Programme. The United Nations uses and compiles data from multiple international agencies with the mandate, resources, and expertise to collect national data on specific indicators.
The use of the data is ethical because individual data is kept confidential. The United Nations also works hard to make sure all data obtained by international agencies is accurate. If any discrepancies are found in the data, it is brought to the attention of national and international data authorities
For more information on HDI you can follow this link to our data source:
Mean years of schooling shows the average number of years an average person in that country spends in school based on that year. For example: In Afghanistan, the average length a person would spend in school in the year 2003 is 2.4 years.
Government expenditure on education displays the amount of of money spent on education by percentage of the respective country’s annual budget. For example, as seen for the countries Afghanistan, Albania, and Andorra, they all spent 4.2%, 4.0%, and 3.3% of their annual budget on education in 2016, respectively.
Literacy rate displays the percentage of the country (that are over the age of 15) that would be considered "literate". This is an important statistic because the literacy rate can be compared to the amount of government expenditure on education and calculate some sort of correlation. Although there is pretty scattered data for literacy rate, the percentages for certain countries can still be seen. For example, in Antigua and Barbuda, for the year 2015, 99.0% of the population aged 15 years or older were literate.
Human Development Index (HDI) is an excellent way to display the average achievement in key dimensions of human development which include: a long health life, being knowledgeable, and having a decent standard of living. This index is often used to rank countries as a higher score indicates that a lifespan is higher, the education level is higher, and the gross national income per capita is higher. The levels of HDI are in categories such that the range 0.800-1.00 is a very high HDI, 0.700-0.799 is high, 0.550-0.699 is medium, and 0.350-0.549 is low. In the case of this project where education levels are being inspected, it can be seen that a higher level of mean years of schooling, government expenditure on education, and literacy rate all impact the HDI positively.

Mean Years of Schooling
.
.
.
.
.
Government Spending:
.
.
.
.
.
Literacy Rate:
.
.
.
.
.
.
.
.
.
.
.
.
Human Development Index (HDI):
The linear regression performed on the empirical data gave us a slope of 4.12 and an intercept of 65.05. This would mean that a 1% more of the annual budget on education would coincide with a 4% increase in literacy rate. The pairs bootstrap for linear regression yield a 95% confidence interval of (2.52,5.74) for the slope. By overlaying the pairs bootstrap linear regression lines, it does not appear to explain the data well because of the large spread.
We performed a bootstrap hypothesis test to investigate whether the medium and low HDI ranked countries have significantly similar literacy rates. Our test yielded a P-value of 0.0 which means there is a 0% chance of getting the empirical difference of means assuming the null hypothesis is true. Therefore, we reject the null hypothesis, and the literacy rates of medium and low HDI ranked countries are significantly different.
The objective of this project was to analyze data from the United Nations Human Development Programme in order to effectively understand how education differs across the globe. Datasets regarding the number of years of schooling, government spending on education, literacy rates, and Human Development Index in 200 countries were prepped and explored in order to do so. These four data sets were specifically chosen because it was hypothesized that they have the greatest impact on literacy rate and overall globalization.
The data preparation was fairly straightforward and essentially identical for each of the four datasets chosen. In essence, mean years of schooling displays the average number of years a person spent in one school per specific country, government expenditure on education displays how much money the government spent on education, literacy rate displays the percentage of citizens that are considered literate, and the Human Development Index displays the HDI of that country.
Data analysis was completed to better visualize the trends in all of the datasets as time progressed.
Model building was completed with HDI and literacy rate via bootstrap hypothesis testing. Through this process, it was found that countries with low and medium HDI rankings had significantly different literacy rates thus supporting the hypothesis that HDI is a valid measure of human development.
Our findings can also set directions for global and humanitarian efforts to support and help increase the literacy rates in low and developing countries.
Some of the limitations we have found through the exploratory data analysis is that many of the education datasets did not contain consistent data. Many countries did not release the data required to calculate the HDI on a yearly basis. Because of this many of the datasets were missing data, some countries only had specific data for one or a few years, while other countries did not have any data at all. The inconsistent and disproportionate data made it difficult to accurately compare education trends of different countries.
Another limitation of the research is that the HDI fails to take into account factors such as inequality, poverty, and gender disparity. This is significant because countries who spend more on GNI per capita can hide the widespread inequality within a country.
For further research, it would be essential to look into how gender disparity in education or poverty can impact literacy rates and the overall HDI Rank of a country.