Recent site activity

Emoji Symbols

Background

The proposal for encoding of Emoji symbols as Unicode characters covers the Emoji symbols that are in widespread use by DoCoMo, KDDI and Softbank for their mobile phone networks. These symbols are encoded in carrier-specific versions of Shift-JIS (as User-Defined Characters), and, in the case of KDDI, in a carrier-specific version of ISO-2022-JP. There are mapping tables in use in the industry between these character sets, with both roundtrip and fallback mappings. These symbols are also supported in web mail services by Yahoo! Mail and Google Mail. (Yahoo! Mail currently supports a subset.) (The original proposal also included nine symbols defined by Google, but they were withdrawn from later versions.)

We are taking into consideration the following factors in developing the proposal:
  1. Source separation rule: If a single carrier separates two characters (anywhere in the character set, so including standard JIS codes), then we mapped them to two separate Unicode characters. (This is a hard and fast rule.)
  2. Reuse: We mapped to existing Unicode symbols where appropriate.
  3. Separating generic symbols: If Unicode had a set of related symbols, but no one character in the set was as generic as in the Emoji symbol sets, then we encoded a new character. For example, the Emoji sets do not distinguish between waxing and waning crescent moons.
  4. Colors and Animation: We encoded symbols as characters, abstracting away from colors and animation. We only distinguished by nominal color or animation for the source separation rule. (See Character Names below.)
  5. Existing cross-mapping tables: We followed the tables mentioned above as much as possible, but we tentatively disunified in some cases where the visual images were very different and not semantically associated. For example:
    1. We disunified the 'M' symbol for Metro from the Metro train image. The 'M' symbol would have translation problems. (This is similar to the problems with the international currency symbol and the proposal for a "generic decimal separator".)
    2. On the other hand, we unified the sets of Zodiac symbols, even though the images shown by carriers vary widely. This is because they clearly belong to a cohesive set which corresponds across carriers.
  6. Least-marked common symbol: For a set of symbols which each could map to an existing Unicode code point, we chose the symbol that was shared among the most carriers (according to the cross-mapping tables) and had the least-marked form.
Note: We tried to avoid disunification in Unicode where there are round-trip mappings between carriers. However, where necessary, the disunification can be done. As the following diagram illustrates, roundtrip mappings between carrier Shift-JIS character sets can be maintained, by having the mapping tables between Unicode and each carrier's Shift-JIS version use appropriate fallback mappings.

KDDI

Unicode

Softbank
x

X

y
x

Y

y
x

y

Character Names

Proposed character names are typically based on the glosses of the carrier symbols or the visual appearance. Based on the consensus from discussions in the UTC, we used the following guidelines:
  • Follow the analogies of existing Unicode character names where possible
  • In particular, use "BLACK" for "filled" and "WHITE" for "hollow".
  • Exclude color and animation details from proposed character names except where necessary for distinction.
  • For cases where color is the only source distinction, the convention is to map to BLACK and WHITE where there are two choices, and to BLACK, WHITE, and CHECKERED where there are three, and to BLACK, WHITE, CHECKERED. and STRIPED where there are four.
  • Chart annotations will be added to indicate the preferred representations on color devices.

Documents

See the latest charts:
L2 document links require access privileges granted to Unicode Consortium members.

2010-04-27: L2/10-132 Emoji Symbols: Background Data
2010-04-24: L2/10-150 Emoji Sources (=N3835)
2010-04-23: Resolutions of WG 2 meeting 56 (=N3804)
2010-04-22: Summary of repertoire for FDAM8 content of ISO/IEC 10646:2003 (=N3838)
2010-04-22: Disposition of comments on SC2 N 4123 (FPDAM8) (=N3828)
2010-04-22: Emoticons for FDAM8 (=N3826) [consensus of the Emoji Ad-Hoc Committee at WG2 56 San José]
2010-04-21: Emoji Ad-Hoc Meeting Report (=N3829)
2010-04-06: L2/10-115 Rationale for Proposal of N3778 (=N3806)
2010-04-02: Draft disposition of comments on SC2 N 4123 (FPDAM8) (=N3792)
2010-03-28: [Untitled Japanese ballot comments on FPDAM8] (=N3790-JISC)
2010-03-27: L2/10-102 Summary of Voting on SC 2 N 4123, ISO/IEC 10646: 2003/FPDAM8 (=N3790)
2010-03-23: Irish comments on FPDAM8 for ISO/IEC 10646:2003 (=N3790-NSAI)
2010-03-22: [Untitled German ballot comments on FPDAM8] (=N3790-DIN)
2010-03-08: L2/10-090 Willcom Input on Emoji (=N3783)
2010-03-08: L2/10-089 KDDI Input on Emoji (=N3777)
2010-03-08: L2/10-088 DoCoMo Input on Emoji (=N3776)
2010-03-03: Proposal on use of ZERO WIDTH JOINER (ZWJ) between two Regional Indicator Symbols (=N3779)
2010-03-03: Updated Proposal to Change Some Glyphs and Names of Emoticons (=N3778) [update of N3711]
2010-02-05: L2/10-066 Comments accompanying the U.S. negative vote on FPDAM 8 to ISO/IEC 10646:2003 (=N3790-ANSI)
2010-01-26: L2/10-036 Proposal to encode an emoticon "Neutral Face" in the UCS
2010-01-13: L2/10-009 Emoji symbols: background data
2009-11-24: AMENDMENT 8: Additional symbols, Bamum supplement, CJK Unified Ideographs Extension D, and other characters [FPDAM8] (=N3738)
2009-11-05: Emoji Sources (=N3728R) [updated version of N3712 and L2/09-078=N3585] [original N3728 from before Tokyo meeting, 2009-10-28]
2009-10-28: Proposal to encode Regional Indicator Symbols in the UCS (=N3727)
2009-10-27: Emoji Ad-Hoc Meeting Report [ad-hoc during Tokyo WG2 meeting] (=N3726)
2009-10-22: Comments on proposal to revise a part of emoticons in PDAM8-N3711 [Karl Pentzlin] (=N3713)
2009-10-22: A proposal to Revise a Part of Emoticons in PDAM 8 [Katsuhiro Ogata et al.] (=N3711)
2009-10-21: Emoji Sources (=N3712) [updated version of L2/09-078=N3585]
2009-10-01: Draft disposition of comments on SC2 N 4078 (PDAM text for Amendment 8 to ISO/IEC 10646:2003) (=N3690)
2009-09-25: L2/09-323 Proposal to encode two additional Mailbox Symbols complementing the Emoji set (=N3687)
2009-09-22: Summary of Voting on SC 2 N 4078,ISO/IEC 10646: 2003/PDAM 8 (=N3684) (also German, Greek, Irish, Japanese, UK and US comments)
2009-09-22: L2/09-304 US Position on PDAM 8
2009-09-18: L2/09-318 Proposal to encode Symbols for ISO 3166 Two-letter Codes in the UCS (=N3680)
2009-09-17: Emoji Symbols: Background Data (=N3681) [updated version of L2/09-027]
2009-08-06: L2/09-272 Emoji: Review of PDAM 8
2009-05-21: L2/09-287 AMENDMENT 8: Additional symbols, Bamum supplement, CJK Unified Ideographs Extension D, and other characters [PDAM8] (=N3658)
2009-04-27: L2/09-153 Emoji Ad-Hoc Meeting Report (=N3636)
2009-04-10: L2/09-139 Response to Concerns Raised in N3607 About Encoding Emoji Characters (=N3614)
2009-04-06: L2/09-114 Towards an encoding of symbol characters used as emoji (=N3607) (Everson & Stötzner, Irish and German NB response to the Emoji proposal)
2009-03-05: L2/09-025R2 Proposal for Encoding Emoji Symbols (=N3582)
2009-02-06: L2/09-026R Emoji Symbols Proposed for New Encoding (=N3583)
2009-02-06: L2/09-078 Emoji Sources (=N3585)
2009-01-30: L2/09-025 Proposal for Encoding Emoji Symbols
2009-01-30: L2/09-026 Emoji Symbols Proposed for New Encoding
2009-01-30: L2/09-027 Emoji Symbols: Background Data
2008-08-13: L2/08-323 Scripts Subcommittee Draft Notes and Recommendations to UTC #116
2008-08-12: L2/08-314 Emoticon Core Set - working proposal
2008-08-12: (HTML) Table for Working Draft Proposal for Encoding Emoji Symbols
2008-08-12: L2/08-315 Emoji Symbols: Open Issues
2008-08-12: L2/08-309 Emoji Encoding Proposal: Progress Report
2008-08-11: L2/08-305 Some suggestions about the encoding of national flags as requested by the Emoji proposal (L2/08-081)
2008-07-17: (Doc) Feedback on the Updated Emoji Encoding Proposal (=L2/08-081) [L2/08-106 plus additional feedback from UTC #114]
2008-02-05: L2/08-106 Feedback on the Updated Emoji Encoding Proposal (=L2/08-081) [email feedback before UTC #114]
2008-01-30: L2/08-081 Working Draft Proposal (2) for Encoding Emoji Symbols
2008-01-30: L2/08-080 Emoji Proposal Data (PDF snapshot); Zip file of HTML + images HERE; Temporarily hosted live HERE
2007-08-03: L2/07-257 Working Draft Proposal for Encoding Emoji Symbols (Associated tables in ZIP file)

Resources

Google Data and Tools

Google uses Private Use mappings to represent Emoji ("picture character") symbols in Unicode text. The emoji4unicode project makes these mappings available. This project also provides data and tools that can be used in the development of the encoding proposal. The tools are Python scripts that provide for consistency checks, reports on the data, and chart generation.

This page and its subpages contain the project documentation. Generated HTML charts are available at http://www.unicode.org/~scherer/emoji4unicode/

See the project announcements (blog posts) in English and Japanese.

DoCoMo

KDDI

SoftBank

Conversion Tables

Additional non-carrier references

Related

See also:
  • Japanese TV Symbols
  • WAP Pictogram Specification approved Version 1.1 -- part of OMA Browsing V2.3 Enabler Specification
  • RIS 506 "Music CD Shift-JIS" [need link]
  • WingDings fonts [need link]
Arle Lommel sent the following to the emoji4unicode group on 2008-12-27: (Highlighting by Markus Scherer; link to the complete email)

You may have seen this on Unicore, but if not, I have done a comparison of the emoji repertoire with the emoticons used in chat or bulletin board systems from seven major vendors in this area (Skype, Microsoft, Yahoo, America Online, Google, vBulletin, and phpBB). You can download the results from here:

http://dl.getdropbox.com/u/223919/emoticons.pdf
[...]