Updating Script Metadata

New Unicode scripts

We should work on script metadata early for a Unicode version, so that it is available for tools (such as Mark's "UCA" tools).

If the new Unicode version's PropertyValueAliases.txt does not have lines for Block and Script properties yet, then create a preliminary version. Diff the Blocks.txt file and UnicodeData.txt to find new scripts. Get the script codes from http://www.unicode.org/iso15924/codelists.html . Follow existing patterns for block and script names, especially for abbreviations. Do not add abbreviations (which differ from the long forms) unless there is a well-established pattern in the existing data.

Sample characters

We need sample characters for the "UCA" tools for generating FractionalUCA.txt.

Look for patterns of what kinds of characters we have picked for other scripts, for example the script's letter "KA". We basically want a character where people say "that looks Greek", and the same shape should not be used in multiple scripts. So for Latin we use "L", not "A". We usually prefer consonants, if applicable, but it is more important that a character look unique across scripts. It does want to be a letter, and if possible should not be a combining mark. It would be nice if the letters were commonly used in the majority language, if there are multiple. Compare with the charts for existing scripts, especially related ones.

Editing the spreadsheet

Use and copy cell formulas rather than duplicating contents, if possible. Look for which cells have formulas in existing data, especially for Unicode 1.1 and 7.0 scripts.

For example,
  • Script names should only be entered on the LikeLanguage sheet. Other sheets should use a formula to map from the script code.
  • On the Samples sheet, use a formula to map from the code point to the actual character. This is especially important for avoiding mistakes since almost no one will have font support for the new scripts, which means that most people will see "Tofu" glyphs for the sample characters.

Script Metadata properties file

  1. Go to the spreadsheet Script Metadata
  2. File>Download as>Comma Separated Values
  3. Location/Name = {CLDR}/tools/java/org/unicode/cldr/util/data/Script_Metadata.csv
  4. Refresh files (eclipse), then compare with previous version for sanity check.
  5. Run {cldr}/tools/java/org/unicode/cldr/unittest/TestScriptMetadata.java
    1. A common error is if some of the data from the spreadsheet is missing, or has incorrect values.
  6. Run GenerateScriptMetadata, which will produce a modified common/properties/scriptMetadata.txt file.
  7. Compare with previous version for sanity check.
  8. Check in the two new files.
  9. (Make sure you do LikelySubtags and Default Content)
Problems are typically because a non-standard name is used for a territory name. That can be fixed and the process rerun.