I recently made up two nonsensical domain names—eixpay.com and eixpay.com—can you spot the difference between them?
In a modern Unicode-capable browser, they are likely to appear identical but if you copy and paste each one into a search engine, you will get different results. The domain on the right was created using Cyrillic characters while the one on the left was created using Western characters. While most Cyrillic characters vastly differ from US-ASCII characters, a handful of symbols look at home in either character set (see page 2 of the chart).
When viewed in hexdump, you can clearly see the difference between the two domain names. As shown, .com is written in ASCII code in both names (see Figure 1).
![]() |
![]() |
I then created a simple HTML file with links to each domain name. Note that the ASCII domain name is italicized in the anchor tag while the Unicode domain name is not (see Figure 2).
When I first pulled it up in Firefox 3.5, the character encoding was set to ISO-8859-1(Western) so the Unicode link clearly differed (see Figure 3).
![]() |
![]() |
A quick change of my character encoding to Unicode(UTF-8), however, resulted in an altogether different scenario (see Figure 4).
GIZMODO points out that this works with other strings as well. An attacker thus only needs to find commonly used code pages that they can use to piece together the characters they will need to spoof legitimate sites. In my brief testing, I found that using more than one Unicode block in a single URL produces unpredictable results.
Recent discussions about the Internet Corporation for Assigned Names and Numbers (ICANN)’s approval of the use of internationalized domain names (IDNs) and how they can pose additional security risks have been ensuing. Some believe that allowing the use of IDNs can make antiphishing efforts harder.
Simply put, IDNs work by converting Unicode strings into punycode strings before the browser queries a Domain Name System (DNS). For instance, the punycode version of eixpay.com is xn--80aj7anh5h.com. At www.IDNstuff.com/, they have a handy tool for converting Unicode strings into punycode and back.
It took some digging but I did find a few registrars that support punycoded domain names on existing top-level domain names (.com, .net, etc.). Should a cybercriminal register xn--80aj7anh5h.com, he/she can create a pass-through of the ASCII eixpay.com page and use email, instant messaging (IM), or social networking to entice users to click a Unicode link that will connect them to a look-alike phishing page. Simply double-checking a site’s name may thus no longer be enough.
If you want to see how real IDNs react in your browsers and other tools, take a look here for some active samples.