Updating Population, GDP, Literacy

2015-08-11: These instructions are out of date and need to be updated, particularly the section about factbook literacy, which is now downloaded as a txt file instead of html.

Updated 2013-03-25 by J. C. Emmons

Instructions are based on FIrefox browser.

Load the World DataBank

The World DataBank is at (http://databank.worldbank.org/data/views/variableselection/selectvariables.aspx?source=world-development-indicators). If the page has been moved, try to get to it by doing the following:

  1. Go to http://worldbank.org
  2. Click "Data" at top
  3. Click "Data Catalog"
  4. Click on the blue "DATABANK" link in the section called "World Development Indicators".  It should be at or near the top.
Once you are there, generate a file by using the following steps.  There are 3 collapsible sections, "COUNTRIES", "SERIES", and "TIME"
  • Countries
    • Select All countries.  You do NOT need to select any of the hierarchy levels from the left sidebar.  The right side bar shows your status.  There were 214 countries on the list when these instructions were written.
  • Series
    • Select "Population, total"
    • Select "GNI, PPP (current international $)"
  • Time
    • Select all years starting at 2000 up to the latest available year.  The latest as of this writing was "2012". 
      • If the latest year is greater than the latest year in the current tools/java/org/unicode/cldr/util/data/external/workd_bank.csv, then add the new one(s) to the WBLine enum in org/unicode/cldr/tool/AddPopulationData.java .

  • Click the "DOWNLOAD" link in the upper right.  A small "Download options" box will appear.  Select "CSV" and click "Download".  Instruct your browser to the save the file. 

  • You will get a file of the form:
Afghanistan    AFG    GNI, PPP (current international $)    NY.GNP.MKTP.PP.CD    22092804549    23943938506    ..
Afghanistan    AFG    Population, total    SP.POP.TOTL    ..    ..    ..
with a strange name like 157c367b-420b-4449-ac1e-37d355b5b1d8.csv
  1. Rename it to world_bank_data.csv and and save in org/uniocde/cldr/util/data/external/
  2. Diff the old version vs. the current.
  3. If the format changes, you'll have to modify AddPopulationData.WBLine to have the right order and contents. Often this is just adding an extra year field.

Load UN Literacy Data

  1. Goto http://unstats.un.org/unsd/demographic/products/socind/default.htm
  2. Click on "Education"
  3. Click in "Table 4a - Literacy"
  4. Download data - save as temporary file
  5. Open in Excel or OpenOffice - save as data/external/un_literacy.csv (Windows Comma Separated)
  6. Diff the old version vs. the current.
  7. If the format changes, you'll have to modify the loadUnLiteracy() method in org/unicode/cldr/tool/AddPopulationData.java

Load CIA Factbook

  1. Goto: https://www.cia.gov/library/publications/the-world-factbook/index.html
  2. Goto the "References" tab, and click on "Guide to Country Comparisons"
  3. Expand "People and Society" and click on "Population" -
    1. Right Click on DownloadData, Save Link As... call it  org/unicode/cldr/util/data/external/factbook_population.txt
  4. Back up a page, then Expand "Economy" and click on "GDP (purchasing power parity)"
      1. Right Click on DownloadData, Save Link As... call it  org/unicode/cldr/util/data/external/factbook_gdp_ppp.txt
  5. Click on the "References" tab at the top,  and click on "Guide to Country Profiles"
    1. Expand "People and Society" and click on "Literacy"
    2. It will take you to a glossary section - Click on the "Field Listing" button on the right side next to "Literacy" in the header.
    3. Right Click on "Print", Save Link As...  Call it org/unicode/cldr/util/data/external/factbook_literacy.html
  6. Diff the old version vs. the current.
  7. If the format changes, you'll have to modify the loadFactbookLiteracy()) method in org/unicode/cldr/tool/AddPopulationData.java

Convert the data

  1. If you saw any different country names above, you'll need to edit external/alternate_country_names.txt to add them.
  2. Run "AddPopulationData -DADD_POP=true" and look for errors.
  3. Once done, then run the ConvertLanguageData tool as on Update Language Script Info
  4. Once everything looks ok, check everything in to SVN.