Site has moved. Current site can be accessed at: https://cldr.unicode.org
The following are the files for this release. For a description of their purpose and format, see the Key; for more details see CLDR Releases (Downloads).
Unicode CLDR 21.0 contains data for 193 languages and 170 territories: 528 locales in all. This release did not include a public data submission phase, and focused on improvements to the LDML structure and tools, and consistency of data. Approximately 10,000 data items were removed, and 15,000 added (change in value counts as a removal plus an addition).
The main features include the following (see the Delta link above for a full list of changes):
Ordinal categories (1st, 2nd,…)
Context Transforms for context-dependent capitalization behavior.
Structure for support of Chinese lunar calendars
Gender of lists of people
Territory Containment of deprecated territories
ISO 8601 time zone formatting
Collation reordering (eg, putting Cyrillic before Latin)
Support of parent-locale data in supplemental data
Multiple number systems for locales.
Updates for Unicode 6.1 (segmentation, collation)
Major cleanup of timezone names and date format data; new timezone IDs
Abbreviated numbers (eg, “1.2 B” for 1,200,000,000)
Added South Sudan (new country)
Updated collations
Cleaned up delimiter data (“…” vs ”…” vs „…“ vs „…” vs «…» vs「…」…)
New default content locale for Arabic: ar_001 (Modern Standard Arabic)
Deprecation and removal of whole-locale aliasing, and commonlyUsed elements.
Data for Chinese lunar calendar support.
The -t- BCP47 extension for transformed content
Enhanced number system support.
Major performance enhancements
New <metadata> element in locales for use by Survey Tool (eg, checking capitalization consistency)
The release numbering changed in this release, from x.y.z to xy.z. Thus the new release number is 21.0 (rather than 2.1.0). This change was made to allow implementations to use 2 numeric subfields for internal numbering, such as 21.0.3.4.
Locales with insufficient coverage were moved into the "seed" directory, and are not part of the release. The data is available via SVN. There are now 16 such languages and associated regions. When coverage for a seed language improves sufficiently, it will be moved into the release.
The changes to the specification are found at LDML Modifications. Main features include:
Descriptions of the above features.
Many clarifications of existing features, such as the use of YY, h, H, K, k in date patterns and skeletons, restrictions on parentLocale, Windows TZID mapping, use of the region code 'UK', the status of non-decimal numbering systems
Deprecation of some structure, such as the 'l' (SMALL LETTER L) pattern character for leap month marker and the commonlyUsed element in formatting short time zone names.
The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.
For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.