CLDR 33 Release Note (DRAFT)

This version is currently at Beta. Also see the latest release.

Date Rel. Note Data Charts Spec Delta SVN Tag DTD Diffs DTD Δs
33-beta 2018-03-14 v33 CLDR33 Charts33 LDML33 Δ33 release-33-beta ΔDTD33  33


Unicode CLDR 33 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

This release had a limited submission phase.

Improvements in this release include: 

  • Additional Translations/Data
    • Annotations (emoji keywords) for a limited set of locales were completed (ar, en_GB, de, es, ja, ru) 
    • Two additional locales (Odia, Assamese) were brought up to Modern coverage level.
    • Some missing items in other locales were added.
    • New typographicNames added, with translations in 33 locales.
    • Added 4 new transforms: fa-fa_FONIPA, ha-ha_NE, nv-nv_FONIPA, vec-vec_FONIPA.
  • Property files
    • ExtendedPictographic.txt was removed (now in emoji data files for UTS #51)
    • labels.txt was added for emoji categories and subcategories. 
  • Structure
    • The only new elements/attributes were for typographicNames (such as terms for Bold, Italic, ...).
    • The structure for specifying keyboard layouts was significantly enhanced. See spec for details: Keyboards. [TBD: Important keyboard changes]
  • Code Updates
    • Addition of new currency code MRU for Mauritania.
    • Updating of currency display names for São Tomé & Príncipe Dobra (STN).
    • New GDP info
    • Subdivisions (including all new codes for China)
  • Bug fixes
For information on structural changes, see Spec Modifications. (TBD: Spec changes are not yet completed.)
For changes that may affect migration to this version, see Migration.


The charts have been updated for the v33 data. There are also new tab-separated-value files for loading the information into spreadsheets rather than trying to scrape the charts. Just a subset of the charts are available so far, in ​[TBD]. People are advised to use "Save Link As..." rather than opening in browser, since the files can be sizable.
  1. by_type.tsv
  2. delta.tsv — locales w/ inheritance
  3. delta_supp.tsv — supplemental data (eg non locale)
  4. delta_summary.tsv — stats on #2 & #3

Survey Tool


Other data additions and changes

The following summarizes some of the other changes in non-locale data.


For more information, see detailed delta charts.


Because v33 was not a data submission release, the chart for growth differs little from that of the CLDR 32 Release Note. Here are the overall statistics:


The following files showed the largest number of raw changes: 

  • annotations/as.xml, main/as.xml, annotations/ru.xml, main/br.xml, annotations/or.xml, annotations/br.xml, annotations/ga.xml
Two changes affected the statistics:
  • The keywords (in annotations) are being treated as sets for counting purposes.
    • So old:{a | b | c}  new:{a | c | d | e} counts as one deletion and 2 additions.
  • The keywords have also had some redundancies removed: if a keyword consisted entirely of other keywords, it was removed.
    • So old:{a, a b, b} → new:{a, b}.


  • Plurals: ordinal and cardinal rules have been added for scn. The cardinal (plural) rules for Macedonian (mk) have been changed so that one➞other for {11}. Should not cause migration issues.
  • TBD

Known Issues

  1. New macroregions
    • UN M.49 now includes Sark (680) but ISO rejected the proposed ISO 3166-1 code, so it is not included.
  2. “Week of” structure
    • The structure and intended usage for the “week x of y” patterns is still being refined and may change. This applies especially to dateFormatItems such as the following:
      <dateFormatItem id="MMMMW" count=...>'week' W 'of' MMM</dateFormatItem>
      <dateFormatItem id="yw" count=...>'week' w 'of' y</dateFormatItem>
      Areas of discussion include the use of the count attribute and the use of ordinal vs. cardinal numbers. For more information see [#9801].
  3. Subdivision Names
    • The draft subdivision names were imported from wikidata. Names that had characters outside of the language's exemplars were excluded for now. Names that would cause collisions were allowed, but marked with superscripted numbers. The goal is to clean up these names over time.
  4. Chinese stroke collation
    • In CLDR 30 and 31, Chinese stroke collation was missing entries for several basic characters. CLDR 32 reverted the stroke collation data to the CLDR 29 version; a complete fix for the underlying problem is targeted for CLDR 34. See #10497, #10642.


Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Key to Header Links

Rel. Note a general description of the contents of the release, and any relevant notes about the release
Data a set of zip files containing the contents of the release (the files are complete in themselves, and do not require files from earlier releases -- for the structure of the zip file, see Repository Organization)
Charts a set of charts showing some of the data in the release.
Spec the version of UTS #35: LDML that corresponds to the release
Delta a list of all the bug fixes and features in the release, which be used to get the precise corresponding file changes using BugDiffs
SVN Tag the files in the release, accessible via via Repository AccessFor more details see CLDR Releases (Downloads)
DTD Diffs a diff of the DTD source files
DTD Δs a link pointing to a charts of changes in the DTDs over time.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.
For web pages with different views of CLDR data, see