CLDR 34 Release Note

No. Date Rel. Note Data Charts Spec Delta SVN Tag DTD Diffs DTD Δs
34 2018-09-12 v34 CLDR34 Charts34 LDML34 Δ34 release-34-alpha ΔDTD34  34

Overview

Unicode CLDR 34 provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR 34 included a full Survey Tool data collection phase:

Level Count Support Target
modern 85 full UI and document content
moderate 6 general-purpose document content
🆕: Somali and Javanese
basic 15 basic application requirements
🆕: Tongan, Konkani, Maori, Dzongkha, Tatar

The above counts are just for the languages (with multiple entries for multi-script languages such as Serbian or Chinese) — there are many additional regional locales. 🆕is for languages reaching the level in this release.

About 200 other languages have over 1,000 data fields, but don't satisfy the level definitions.

Other enhancements include:

  • Japanese calendar era
    • Changes to prepare for the new era starting 2019-05-01
  • Emoji
    • Full review and update of names and annotations
    • Updates for Emoji Subcommittee for collation and grouping

For details, see Detailed Structure Changes, Detailed Data Changes, Growth

This version is currently at Alpha — See the latest release if you want released data.

This release page will be fleshed out over the coming weeks with more details, migration issues, known problems, and so on. The LDML specification has also not yet been updated. Particularly useful for review are:

Please report any problems that you find using a CLDR ticket. We'd also appreciate it if programmatic users of CLDR data download the xml files and do a trial integration to see if any problems arise.

Detailed Structure Changes

  • Calendar
    • For supplemental calendar <era> elements, a new attribute “named” indicates whether the era has been assigned a name (part of preparation for Japan era transition). If false, the era will not be used in formatting. [#10750]
  • Units
    • Added <displayName> as a subelement for <coordinateUnit>, to provide a name such as “cardinal direction” for any of the <coordinateUnitPattern > elements. [#9986]
  • Patterns
    • Added short patterns for "at most" and "approximately" (the latter for use in smart number ranges) [#11046] (and data for main languages)
  • Misc
    • Deprecated territoryCodes@internet [#11072]

Detailed Data Changes

  • Data additions and updates from Survey Tool data submission and vetting. In addition:
  • Calendar
    • In supplemental calendar data for Japanese calendar, added era 236 starting 2019-05-01 with attribute named="false"; in “root” and “ja” locales, added placeholder era names for era 236  (part of preparation for Japan era transition). [#10750]
    • In supplemental calendar data for Japanese calendar, fixed invalid date values for some historic Japanese-calendar eras. [#11399]
    • Changed the firstDay (first day of week for calendar display)...
      • from Monday to Sunday for PT Portugal. [#10716]
      • from Sunday to Monday for IE Ireland [#11192]
      • from Saturday to Monday for MA Morocco. [#11052]
      • from Sunday to Monday for TN Tunisia.. [#11052]
    • Changed weekend from fri-sat to sat-sun for MA Morocco, TN Tunisia. [#11052]
  • Date formatting
    • Added intervalFormats for skeletons with era to gregorian and generic calendars for “root”, “en”, and “ja” locales (part of preparation for Japan era transition). [#11327]
    • In Finnish, changed date formats to use either full or numeric months names, avoiding the abbreviated names (which are still available as symbols). [#10870]
    • In Korean, fixed a problem in which many formatted dates in certain calendars (buddhist, japanese, minguo) displayed a doubled character for month “월월”. [#11347]
  • Currency codes and symbols
    • Added “¤” as the symbol for unknown currency XXX. This used to be done by ICU in code, but it makes more sense to have the data in CLDR. [#11074]
    • In Thai locale, changed the symbol for THB from “THB” back to “฿”. [#10316]
    • Support new Venezuelan currency VES as the default starting from 2018-08-20.
    • Changed currency patterns for “az” (Latn, Cyrl) to put symbol at the end
    • Added support for MVP as historic currency of Maldives.
  • Units
    • Added units for concentr-percent (e.g. “25%)” and concentr-permille (e.g. “37‰”). The former may be able to replace some usages of the <percentFormats> in <numbers>; it provides different display widths and plural forms, through it does not include a number format and cannot vary by number system. [#10632]
    • Added units for pressure-atmosphere (e.g. “1 atm”) and digital-petabyte (e.g. “10 PB”). [#10600, #14075]
  • Plural rules
    • Rules added for (cardinals) ia, sc ;  (ordinals) gd, ia, sc
  • Emoji
    • Reviewed and revised emoji names and keywords for most languages
      • The survey tool voting process was adapted to support sets more naturally
    • Updated the emoji ordering to group characters more naturally [#11227], ...
    • Revised the derived name generation (for complex emoji) for more consistency with the new hair styles
  • Data Cleanup
    • Modified the input processor for Kyrgyz [#10738] and Urdu [#10543]
    • Improved the Zawgji detection/conversion for input to Survey Tool
    • Fixed Dutch dayPeriod names to use correct apostrophes.
  • Misc
    • Added fallbacks for "or" lists [#11254]
    • Added English names for Pseudolocales [#10880]
    • Cleaned up root parseLenient data [#11055]
    • Deprecated the telephone number data [#10383]
    • Changed default region for “ia” Interlingua from FR to 001 (World).
    • Several corrections to number spellout rules for Hungarian.
    • Added Uighur to IPA transliterator [#11318]
    • Added data for England, Scotland, Wales (now done with Survey Tool [#10252]) 
    • The French locale now uses narrow no-break space U+202F is several places: as the numeric grouping separator, in many short unit patterns, and in the locale display name patterns. It also changed normal space to no-break space U+00A0 in the wide unit patterns.
For more information these and other bug fixes, see detailed delta charts and the list of bug fixes.

Growth

The following summarizes the number of changes (additions + corrections) for languages in the release.

Lang. Changes Examples
12 2,000–5,000 Pashto, FrenchChinese, …
27 1,000–1,999
52 500–999
48 50–499

The following shows languages with a larger relative number of changes. For the first line, there are over 20% additions alone, not counting corrections.

Lang.ChangesExamples
4≥ 20%Pashto, Maori, Uzbek (Arabic), and Punjabi (Arabic)
5≥ 50%Interlingua, Fulah (Adlam), Somali, Javanese, and Maori

TBD: add chart

Migration

  • French grouping separator changed from no-break space U+00A0 to narrow no-break space U+202F.
  • TBD

Known Issues

  1. None yet.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Key to Header Links

Rel. Note a general description of the contents of the release, and any relevant notes about the release
Data a set of zip files containing the contents of the release (the files are complete in themselves, and do not require files from earlier releases -- for the structure of the zip file, see Repository Organization)
Charts a set of charts showing some of the data in the release.
Spec the version of UTS #35: LDML that corresponds to the release
Delta a list of all the bug fixes and features in the release, which be used to get the precise corresponding file changes using BugDiffs
SVN Tag the files in the release, accessible via via Repository AccessFor more details see CLDR Releases (Downloads)
DTD Diffs a diff of the DTD source files
DTD Δs a link pointing to a charts of changes in the DTDs over time.


The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.
For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.
Comments