Update Language/Script/Region Subtags

Updated 2015-08-07 by J. C. Emmons

This updates language codes, script codes, and territory codes.

  • First get the latest ISO 639-3 from http://www-01.sil.org/iso639-3/download.asp
    • Download the zip file containing the UTF-8 tables, it will have a name like
    • Unpack the zip file and rename the 4 files to match the following:
      • {cldrdata}tools\java\org\unicode\cldr\util\data\iso-639-3.tab
      • {cldrdata}\tools\java\org\unicode\cldr\util\data\iso-639-3_Name_Index.tab
      • {cldrdata}\tools\java\org\unicode\cldr\util\data\iso-639-3-macrolanguages.tab
      • {cldrdata}\tools\java\org\unicode\cldr\util\data\iso-639-3_Retirements.tab
    • Take the latest version number of any of the files (like iso-639-3_20070615.tab), and paste into
      • {cldrdata}\tools\java\org\unicode\cldr\util\data\iso-639-3-version.tab
  • Go to http://www.iana.org/assignments/language-subtag-registry
    • (you can set up a watch for changes in this page with http://www.watchthatpage.com )
    • Save as {cldrdata}\tools\java\org\unicode\cldr\util\data\language-subtag-registry
  • Go to http://data.iana.org/TLD/
  • If using Eclipse, refresh the files
  • Diff each with the old copy (via "svn diff" or Eclipse "compare with base from working copy") to check for consistency 
    • Certain of the steps below require that you note certain differences.
  • Update programatically generated sections of supplementalData.xml and supplementalMetadata.xml
    • Run CountItems -Dfile.encoding=UTF-8 -DSHOW_FILES -Xmx512M -Dmethod=getSubtagVariables
    • You'll get warnings or errors on missing codes (see below).
    • Replace the appropriate sections of supplementalData.xml and supplementalMetadata.xml.
    • If you need to add any new CLDR specific codes (rare), do so in StandardCodes.java in the String[][] extras list
    • Diff for sanity check
  • Edit common/main/en.xml to add any new names, based on the Descriptions in the registry file.
    • You only need to add new languages and scripts that we add to supplementalMetaData.
    • But you need all territories.
    • Any new macrolanguages need a language alias.
    • Diff for sanity check
  • If the code becomes deprecated, then add to supplementalMetadata under <alias>
    • If there is a single replacement add it.
    • Territories can have multiple replacements. Put them in population order.
  • There are a few territories that don't yet have a top level domain (TLD) assigned, such as "BQ" or "SS".
       If there are new ones added in tlds-alpha-by-domain.txt for a territory already in CLDR, update {cldrdata}\tools\java\org\unicode\cldr\util\data\territory_codes.txt with the new TLD (usually the same as the country code.
  • For new territories (regions) // TODO: automate this more
    • Add to the territoryContainment in supplementalData.xml
    • Add to territory_codes.txt
      • Use the UN mapping above for the 3letter and 3number codes.
      • FIPS is a withdrawn standard as of 2008, so any new territories won't have a FIPS10 code.
      • Look at tlds-alpha-by-domain.txt to see if the new territory has a TLD assigned yet.
      • rerun CountItems above.
    • Add metazone mappings as needed. (Usually John - requires research)
    • Add the country/lang/population data (Usually Rick - requires research)
    • Add the currency data (Usually John - requires research)
    • Update util/data/territory_codes.txt
      • This step will be different once the data is moved into SupplementalData.xml
      • Todo: fix GenerateEnums around Utility.getUTF8Data("territory_codes.txt");
  • Then run GenerateEnums.java, and make sure it completes with no exceptions. (Don't worry about the results.)
  • Run ConsoleCheckCLDR -f en -z FINAL_TESTING -e
    • If you missed any codes, you will get error message: "Unexpected Attribute Value"