Five Institutions, One Group
The World Bank is an international organization that provides financial support to underprivileged countries. It offers loans and grants to these nations, helping them carry out development projects and improve their economies. The World Bank's main goal is to reduce poverty and promote economic growth by giving essential financial and technical aid to these aspiring countries.
The World Bank collaborates with me, a data scientist, for my skill in analyzing complex data and drawing valuable insights. My role involves studying education and economic outcomes to offer essential insights for informed decisions and poverty reduction strategies.
Time Frame: 2020 - present
Steps:
Tested power of AI - Chat GPT gave me a few data banks to choose from and I choose the first one ..World Bank- found out the UK has 13 global goals in place and decided on 4/13.. quality education.
Download process- Got the data base downloaded, uploaded, and downloaded a few times...trial and error then...finally got the data in google colab google colab:)
Discovered the factors which have a measurable correlation with the variable LP(learning poverty), GNI, and internet users. Put the correlations into google colab and graphed them.
Uploaded to e-profolio to showcase my analysis and conclusion.
Data Set Description:
7 varibles total: Countries, Gross national income per capita(GNI), Learning Poverty (LP), school closures, Internet users, Acess to electricity, and population.
Main:
Learning Poverty is a rate or % which measures the share of children who cant read a simple text with comprehension by age 10.
Internet users is also a % of population that uses technology ..from everyday phones to any other types of technology .
GNI (Gross National Income) is essentially the GNI/ Countries population
(Each Varible was calculated by WB)
Varibles:
Iso3c - Countries
From : Country Codes
The display of countries helps convey a more broad picture of the learning poverty.
Learning Poverty
From : World Bank. 2022. "The State of Global Learning Poverty."
Calculations: measures the share of children who cannot read a simple text with comprehension by age 10.
How: Via WB
Internet users ( % of population)
Development relevance: Technology helps advance the world and without access..things arnt so great.
From: Individuals using the Internet (% of population) | Data
Cc: ICT/ITU
Calculations : Statistics - % of population via WorldBank Database
The % of population per country that is using the internet conveys the importance that technology had during COVID19.
Schools Closed
From: Global Monitoring of School Closures caused by COVID-19 Pandemic – Dashboards
Calculation: The amount of closures during the pandemic in each country.
The amount of schools closed in each country helps convey the World Bank's main message of learning poverty. Furthermore provides an insight of the percentage of learning poverty based on other variables.
EG.ELC.ACCS.ZS - Access to electricity
From : Access to electricity (% of population) | Data
Importance :Maintaining reliable and secure electricity services while seeking to rapidly decarbonize power systems is a key challenge for countries throughout the world. More and more countries are becoming increasingly dependent on reliable and secure electricity supplies to underpin economic growth and community prosperity.
Calculations: Access to electricity is the percentage of the population with access to electricity. Electrification data are collected from industry, national surveys and international sources.
The percentage of population within each country that has access to a power source proves the reliability most had on the internet. It also conveys the amount of electrical resources each country had during COVID- thus how the impact of not having any became a detrimental effect on learning.
Population
Importance: The population of each country coveys the data source and has a huge impact on outcome.
From : https://datatopics.worldbank.org/sdgatlas/goal-4-quality-education
GNI per capita - Gross National Income per capita
From : GNI per capita, Atlas method (current US$) | Data ( Atlas Method)
GNI per capita (formerly GNP per capita) is the gross national income, converted to U.S. dollars using the World Bank Atlas method, divided by the midyear population.The importance that GNI has on the chart conveys the average income level of each country.
Question:
What factors affect a countries education?
Visualizations
The following graph has charted out the relationship between learning poverty and GNI aka Gross National Income.
What we know:
Each dot represents a country
As mentioned- the learning poverty is measured by a percentage of the population with kids who can't comprehend a text fully by age 10.
Another thing is that the GNI is represented by the income each country makes in US dollars per capita.
Individual analysis:
Correlation: -0.6617710095674405
What is correlation?
Simplified: The measure of the strength and direction of the linear relationship between two quantitative variables.
The negative correlation means as one varible goes up (GNI) the other one would go down(LP). On the other hand, its the opposite for a positive correlation where both varibles would be going up instead.
The lower the learning poverty rate is then the higher the GNI is and the higher the learning poverty is the lower the GNI is.
From this I can conclude that higher levels of learning poverty, as indicated by the chart, are negatively correlated with GNI ( an economy's financial well-being) within a country.
The graph above conveys the relationship between learning poverty and internet users.
What we know:
Each dot represents a country
As mentioned- the learning poverty is measured by a percentage of the population with kids who cant comprehend a text fully by age 10.
Internet users is calculated by the percentage of a country's population that uses the internet, typically measured through surveys and statistical data.
People who have access to the internet read alot more then those who dont.
Individual analysis:
Correlation: -0.799240152085133
Again this is a negative correlation meaning witht the higher amount of internet users indicating less learning poverty rate.
Overall:
Internet users have a better chance at learning due to the broad topics online. Thus, learning poverty seems to decrease when more citizens have access to the internet. Further research is needed to fully understand the impact.
Explained:
What we know:
Schools were closed during the pandemic year and thus affected the learning poverty.
The correlation wasn't strong so I didn't look too much into it but we all felt the learning poverty strain on closed schools.
Conclusion
From GNI per capita, internet users, and the school closures during the pandemic year to now, acess to more resources may be a significant providing factor in battling learning poverty.
Given that money, internet, and schools are all considered vital student resources, it can be concluded each factor makes up a piece of a quality education. Finacially, a higher GNI is what relates to lower learning poverty, indicated by the negative correlation shown in the chart. Academically, internet allows students to look up outside resources and thus countries with less internet may have more of a reading deficiancy, or learning poverty rate. To conclude with school closures, the environment in which affects a students education as is proven to be a vital role in a quality education. With everything said and done, a students quality education may depend on the amount of resources provided to them.
Limitations
Time Frame:
- I didnt get enough time to dive deeper into the general factor COVID. I want to identify COVID as a factor in here just to make sure there is no bias views because I realize learning poverty has been around before COVID but this reaearch has extensivly identified the extent it spiked during the first year of COVID in 2020-present.
- Understanding a large amount of material in a short period of time was a bit tricky which is kinda why I wish this program was a week longer. The biggest thing was the formulas for me and the difference between correlation and causation.
- My data is naturally all connceted and with more time I feel like I could search down the chain of factors to which affect a good quality education during the data time frame.
Other Data Sets
Official US Gov. Data - data.gov
This is the homepage of the United States Government's open data
To the left, you can filter based upon popular categories, file types, level of government, or bureau
At the top, a search bar can be used for keywords, and to its right, there is an order-by drop down bar to sort datasets
Kaggle Datasets - click on 'filters' within the search bar, then select 'CSV' for file types & '10.00' for usability rating
If you know your analytical method, you can search with its name as a keyword
I.e, linear regression
Finding the most correlating variables
Above, you'll find a link to a Kaggle notebook in which someone teaches you how to determine which variables are most correlated so that you can run better linear regression analyses by leveraging your knowledge of features' relationships
To find notebooks like this, you can go to Kaggle's code page, where you can search for a topic of your interest or select filter tags like 'beginner' or 'linear regression'
Note: Kaggle is a great resource for learning more about analytical techniques and coding concepts
Tips:
Take note of the number of rows, or observations: do you think this number is too large or too small? What about the number of columns, or variables? Are there any other data sets on the same topic that we could merge our data with?
Do you forsee any issues with the data: is there something you do not understand; does it lack a description; is it formatted different from what you have encountered previously? When in doubt, voice your skepticism to us; we can talk through whether or not a dataset would be a good choice!
Some questions to consider when exploring: How does this compare to the navigation of other sources you've found? - WYM? When uploading a file (of reasonable size), is the process any different, and how does the dataset compare when opened?
Question Box:
Can I analyze a regression model to track progress? Do I have to build one if there is already a regression model within the data set I choose?
Can I put "income" in general as a varible of its own or do I have to put "higher-lower income" as varibles of there own?
On another hand, is pandemic considered a varible? The data set is essentally set on the affect that the pandemic had on learning so Im assuming so. But then again with that said, does that mean " impact of pandemic" is more prefferable in terms of identifying the pandemic as a varible? Or would the "impact" be a varible in itself?
Should money be considered as a varible or does it depend on what im wanting to prove/ research? I have an idea of what I would like to prove which is the quality of life depending on amount of education or just education in general.
Are the recovery techniques or the recovery plans the World Bank have in place considered a varible or something to delve into after identifying the varibles?
Notes
Topic:
Correlation - Essentially the use of correlation analyses is to see if two things are related. Its a statistical measure to which quantifies the strength and direction of the relationship between two variables. It indicates the extent to which changes in one variable are associated with changes in another variable.
Note:
The correlation can be positive, which means both things increase together, or negative, which means one thing increases while the other decreases. The strength of the relationship is measured on a scale from -1 to +1, where a value close to -1 or +1 means a strong relationship, and a value close to 0 means a weak relationship or no relationship at all.
Linear regression - a statistical method used to model the relationship between a dependent variable and one or more independent variables by finding the best-fit line that minimizes the difference between predicted and actual values. It helps us understand how changes in the independent variables relate to changes in the dependent variable. By estimating coefficients, we can measure the magnitude and direction of the relationship.
Multiple regression - Predicting y based on multiple varibles.
Simple Regression Formulas:
Y hat - the predicted y value!
B0 - the estimated intercept or constant
B1 - the estimated slope or coefficient
Calculate the intercept and slope!
Vocabulary
Interpolation: using the least-squares regression line to predict y for any x that is in the range of observations.
Extrapolation: using the regression line to predict values outside of the range of observations.
Data Base:
Where did I get my data?
1/ Chatgpt ("what websites do u recommed for data researches?")
2/ World Bank
What type of data base do I want to study/create?
I want to create a scatter plot, a structured relations database , possibly a linear regression database, and a bar-a-graph. Im not sure if I can use all of them but I will try.
General Varibles (from a glance)
Countries
Education
Learning Poverty
Poverty
Impact of Pandemic
Scores/ Comprehension -
Dropout rate(during COVID19)
Recovery/ Progression techniques?
Money used?
From DATA BASE..you live and u learn!
Countries
LP rate
School Closures
Internet users
Population
Acess to electricity
GNI per capita
Let's Get (Data) Set!
Who did the collection?
The data is originally by The World Bank.
Was it the organization you are working for or another?
The research was done by a person named Brian Stacy, a data scientist working in the Development Data Group (Indicators and Data Services team) at the World Bank.
What did they collect?
Each data collection mostly focused on how education was affected via COVID and that impact on overall income, country, etc.
What are the attributes or variables?
From Database:
GNI per capita - Gross National Income per capita
Iso3c - Countries
Internet users ( % of population)
Learning Poverty
Schools Closed
EG.ELC.ACCS.ZS - Access to electricity
Population in each country
GNI (Gross National Income):
Data Type: Numeric (currency)
Calculation: GNI is the sum of value added by all resident producers plus any product taxes (less subsidies) not included in the valuation of output plus net receipts of primary income (compensation of employees and property income) from abroad.
Units of Measure: Usually expressed in U.S. dollars per capita.
Relationship: GNI is an economic indicator that reflects the income level of a country's residents and can be used to compare economic prosperity between countries.
Countries:
Data Type: Text (names of countries)
Relationship: Countries represent the entities for which the data is being recorded. Each data point in the dataset corresponds to a specific country, allowing for country-level analysis and comparisons.
Internet Users (% of Population):
Data Type: Numeric (percentage)
Calculation: The percentage of a country's population that uses the internet, typically measured through surveys and statistical data.
Units of Measure: Percentage of the total population.
Relationship: Internet usage data can provide insights into a country's technological advancement and connectivity, which may have implications for learning and educational opportunities, especially during the COVID-19 pandemic.
Learning Poverty:
Data Type: Numeric (percentage)
Calculation: The percentage of children who cannot read a simple text with comprehension by age 10.
Units of Measure: Percentage of the total population.
Relationship: Learning poverty is an indicator of educational challenges within a country, particularly related to reading proficiency among children.
Schools Closed:
Data Type: Numeric (count or percentage)
Calculation: The number or percentage of schools closed in each country during the COVID-19 pandemic.
Units of Measure: Count or percentage of total schools.
Relationship: School closures data provides insights into the educational disruptions caused by the pandemic, affecting learning opportunities and potentially contributing to learning poverty.
Access to Electricity (% of Population):
Data Type: Numeric (percentage)
Calculation: The percentage of a country's population with access to electricity.
Units of Measure: Percentage of the total population.
Relationship: Access to electricity data is relevant for understanding a country's infrastructure and technological resources, which may have implications for internet usage and remote learning during the pandemic.
Population:
Data Type: Numeric (count)
Relationship: Population represents the number of people in each country and is essential for calculating per capita indicators like GNI per capita. It is also relevant for understanding the scale and impact of various indicators within a country.
The data is formatted in multiple different charts as seen above. The format is a structured and tabular format. The data is organized into tables with rows and columns, where each row represents a specific country or region, and each column represents a specific indicator or variable. Each indicator/variable is different. There is a total of 7 variables with each one being unique in its own way.
Where was the data collected?
Source : Quality Education - World Bank
When was the data collected?
Collected in 2023 on the year of 2020-present.
Is there a reason that this particular data was collected at this time?
The data collected over the three year span measures the sustained learning losses during and after the prime year of COVID. The data does take data from various other cites to further prove the impact of a good/bad quality education.
What event(s) or actor(s) within, or outside of, the organization might have decided upon or influenced the collection, and how might this decision to collect have been made?- This is the question that had me wanting to define COVID as an influencer due to it furthermore hurting the learning poverty rate.
COVID was an actor during the 2020 year as we all know but according to
Were there any other organizations involved in the collection?
In general:
chatgpt
google colab
google sheets
Citations:
Hevia, Felipe J., Samana Vergara-Lope, Anabel Velásquez-Durán, and David Calderón. 2022. "Estimation of the fundamental learning loss and learning poverty related to COVID-19 pandemic in Mexico." International Journal of Educational Development 88 (2022): 102515.
Laura Moscoviz and David K. Evans. 2022. “Learning Loss and Student Dropouts during the COVID-19 Pandemic: A Review of the Evidence Two Years after Schools Shut Down.” CGD Working Paper 609. Washington, DC: Center for Global Development.
The World Bank, UNESCO and UNICEF (2021). The State of the Global Education Crisis: A Path to Recovery. Washington D.C., Paris, New York: The World Bank, UNESCO, and UNICEF.
World Bank. 2022. "The State of Global Learning Poverty.".
Lichand, G., Doria, C.A., Leal-Neto, O. et al. 2022. The impacts of remote learning in secondary education during the pandemic in Brazil. Nat Hum Behav 6, 1079–1086.
Singh, A., Romero, M. and Muralidharan, K. 2022. COVID-19 Learning Loss and Recovery: Panel Data Evidence from India. RISE Working Paper Series. 22/112.