Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDN characters #242

Open
rquadling opened this issue Aug 18, 2022 · 7 comments
Open

IDN characters #242

rquadling opened this issue Aug 18, 2022 · 7 comments

Comments

@rquadling
Copy link

Would these be a suitable thing to document here?

For example, where do you think this link will take you? http://accounts.googlе.com

Sure. It LOOKS like it'll take you somewhere obvious, but it's not that at all. Hopefully it doesn't ACTUALLy take you anywhere!

All I know ... I'm not clicking it!

@geeknik
Copy link

geeknik commented Aug 18, 2022

I clicked the link and nothing happened.
tumblr_ml01gvUPCG1r18fjgo1_500-2292880854

@bbbco
Copy link

bbbco commented Aug 18, 2022

If you look at the actual link, it looks like this:
Screenshot_20220818-175854_Firefox

@bbbco
Copy link

bbbco commented Aug 18, 2022

I think this is actually referring to Punycode

@ross-spencer
Copy link

@rquadling what does it look like without markdown? I also see http://accounts.xn--googl-3we.com/.

@rquadling
Copy link
Author

IDN / Punycode ... is related ... one is the representation of the other.

So, the IDN allows for Unicode characters. But these characters (and I think they are only English vowels ... maybe not though) look like other letters. So if someone has created a server for the fake URL that then does adds the naughty payload in whatever way it wants and sends you a mocked up back (man in the middle sort of thing).

The URL will show be shown as Punycode. Well. It does in Chrome. Will it in all browsers? Or anything that displays the URL? It's not in links (but is in mouseovers) ...

So that's why I feel IDNs should be considered for the list of naughty strings.

@ssokolow
Copy link

ssokolow commented Sep 6, 2022

Maybe pick one string for each class of problems mentioned in Lord.io's Identity Beyond Usernames?

Byte-wise, the real "epic.com" and the false website "еріс.com" are completely different. But visually, they're indistinguishable from each other in the URL bar, allowing phishing problems to run amock. Unicode canonicalization and normalization can help with certain cases of this problem, but does nothing for our epic.com example.

This particular example isn't visible in Chrome, which instead shows https://xn--e1awd7f.com/, the "punycode" representation of the domain name. This is thanks to Chrome's complex, 13 step process for detecting if a domain name is likely to be a Unicode phish or not. "Well, it may be complex," you tell me, "but at least it solves the phishing problem!" Unfortunately it does not.

Specific instances of IDN homograph attacks have been reported to Chrome, and we continually update our IDN policy to prevent against these attacks.

The Unicode spec is apparently too large to solve this problem 100% perfectly, and so their "solution" is to pay $2000 to anybody who finds new edge cases. This also doesn't actually solve the problem for non-Latin alphabets — if for example, I own a Chinese domain name, it will never show punycode, and attackers can phish my site using duplicate encodings for those Chinese characters. Chrome just attempts to solve the much smaller problem of the numerous Unicode characters that visually look like the Latin alphabet.

That is:

  1. One bad Punycode domain name that'd rely purely on canonicalization and normalization to be caught.
  2. At least one bad Punycode domain name that is disallowed by Chrome's process, but likely to be allowed through by other tools. (Possibly one for each step in Chrome's 13-step process which is of the form "If X, then bail out". For example, "If two or more numbering systems (e.g. European digits + Bengali digits) are mixed, show punycode.")
  3. One bad Punycode domain name that is allowed through by everything for testing protections that are geared toward making uncaught stuff more likely to be recognized as suspicious by the human in the chair.

@rquadling
Copy link
Author

I don't really know what to do here. But in terms of "naughty strings" ... I'm hoping the conversation is interesting enough to add something to the list of "naughty strings" in some way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants