For Developers‎ > ‎Design Documents‎ > ‎

IDN in Google Chrome


Back in the day, hostnames could only consist of the letters A to Z, digits, and a few other characters. Internationalized Domain Names (IDNs) were devised to support arbitrary Unicode characters in hostnames in a backward-compatible way. This works by having user agents transform a hostname containing Unicode characters beyond ASCII to one fitting the traditional mold, which can then be sent on to DNS servers. For example, http://ö is transformed to The transformed form is called ASCII Compatible Encoding (ACE) made up of the four character prefix ( xn-- ) and the punycode representation of Unicode characters.

Ideally, user agents could always display the Unicode version of a hostname. However, different characters from different languages can look very similar, and this can make phishing attacks possible. For example, the Latin "a" looks a lot like the Cyrillic "а", so someone could register http://ebа (, which would easily be mistaken for This is called a homograph attack.

In a perfect world, domain registrars would not allow such nefarious domain names to be registered. Some TLD registrars do exactly that, mostly by restricting the characters allowed, but many do not. For some TLDs that are meant to be international, this would be nontrivial to do (e.g., .com).

As a result, all browsers try to protect against homograph attacks by displaying punycode instead of the original IDN if the hostname does not fulfill certain properties. They try to do this in a way that allows IDN to be shown for valid hostnames, but protects against phishing.

Google Chrome's IDN policy

Starting with Google Chrome 51, whether or not to show hostnames in Unicode is determined independently of the language settings (the Accept-Language list). Its algorithm is similar to what Firefox does. ( the changelist description that implemented the new policy.)

Google Chrome decides if it should show Unicode or punycode for each domain label (component) of a hostname separately. To decide if a component should be shown in Unicode, Google Chrome uses the following algorithm:
(This is implemented by IDNToUnicodeOneComponent and IsIDNComponentSafe() in components/url_formatter/

Consequences / Examples (outdated, to be updated)

Google Chrome will display IDN for components of a hostname consisting solely of characters that belong to one of the languages selected in the language settings—even on .com and .net domains, not only in domains native to that language. For example, http://россия.net will be displayed in IDN form if you claim to speak Russian or another language written in Cyrillic, and as punycode otherwise. Likewise, http://私の団体も.jp/ will be shown in IDN form only if you claim to speak Japanese in Google Chrome's options.

Google Chrome will always display punycode for components of a hostname that contains characters not in the main exemplary character set of any language. For example, http://☃.net/ will always be displayed as punycode in Google Chrome.

Google Chrome will always display punycode for components that mix letters from multiple languages. For example, there is not a single language that contains all characters found in http://şøñđëřżēıċħęŋđőmæîņĭśŧşũþėŗ.de, so this will be shown as punycode. Likewise, http://ebа (with a Cyrillic "а") will always be shown as punycode, even if both English and Russian are in the accepted languages. This is true even if the domain is below a TLD whose registry takes care to protect against homograph attacks.

There are cases where confusion is still possible with this algorithm.  For example, the first component of http://сахар.ru is entirely Cyrillic, while the first component of is entirely Latin.  

Behavior of other browsers


IE displays URLs in IDN form if every component contains only characters of one of the languages configured in "Languages" on the "General" tab of "Internet Options", similar to what Google Chrome does.


Firefox uses a script mixing detection algorithm based on the "Moderately Restrictive" profile of Unicode Technical Report 39. Domains of any single script, any single script + Latin, or a small whitelist of other combinations are displayed as Unicode; everything else is Punycode.


Like Firefox, Opera has a whitelist of TLDs and shows IDN only for these whitelisted TLDs.


Safari has a whitelist of scripts that do not contain confusable characters, and only shows the IDN form for whitelisted scripts. The whitelist does not include Cyrillic and Greek (they are confusable with Latin characters), so Safari will always show punycode for Russian and Greek URLs.