Unicode CLDR 27 has been released, providing an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.
There was no Survey Tool data collection phase for CLDR 27. Instead, the release focused primarily on stability—cleaning up data inheritance and making specific fixes—as well as improvements to the JSON format of the data. Changes include the following:
- Cleanup of region locales: A major cleanup effort was undertaken to resolve gratuitous differences between region-specific locales and the parent from which they inherit. In regional locales, it was determined where the parent value was an acceptable replacement for a child-specific value which could then be removed, providing greater consistency in behavior in the various region locales. A special effort was made to clean up country names in certain locales.
- Changes to English inheritance: As an outcome of the cleanup effort above, the inheritance model for English locales is now simplified, making all en_XX locales inherit from either “en” directly ( for current or former U.S. territories ), or from British-influenced “en_001 - World English”. This is also reflected in some changes for measurement systems.
- Emoji: Data for emoji annotations and an emoji collation were added, to accompany Unicode Technical Report #51, Unicode Emoji.
- Collation: There are new sort orders for emoji (as noted above), and an Austrian phonebook sort order. Scripts can be reordered individually, rather than only in specific groups. Fractional tertiary weights are now used that are lower than common, to allow shorter sort-keys with normal Hiragana letters.
- Specification: The LDML specification has descriptions of new or modified structure, plus a number of fixes and clarifications. See Modifications for a list of changes.
- Improved documentation of locale inheritance and matching, bundle versus item lookup, and parent locale information.
- Extensive clarifications to the intended use of the language matching data.
- Explicit new definitions of Unicode identifiers, such as Unicode Calendar Identifier, for use in citations.
- Charts: The navigation within charts has been improved, and new ones added:
- JSON on github: The JSON form of the data is now available on github, rather than being found through the above Data link.
Details are provided in http://cldr.unicode.org/index/downloads/cldr-27
, along with a detailed Migration section.
About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.