bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#51733: 27.1; Detect impossible email addresses better


From: Eli Zaretskii
Subject: bug#51733: 27.1; Detect impossible email addresses better
Date: Wed, 19 Jan 2022 18:58:54 +0200

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 51733@debbugs.gnu.org,  jidanni@jidanni.org
> Date: Wed, 19 Jan 2022 16:45:29 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > OK, but why do you think "Сгсе.ru" is confusable?  The SLD part is
> > entirely made of single-script characters, and UTS#39 explicitly
> > allows that:
> >
> >   [...] it can be perfectly legitimate to have scripts in a SLD
> >   (second level domain) not be the same as scripts in a TLD (top-level
> >   domain), such as:
> >
> >     Cyrillic labels in a domain name with a TLD of .ru or .рф 
> >
> > That's your case, isn't it?
> 
> Yes, indeed.  But:
> 
> ---
> For some applications, it is useful to determine if a given input string has 
> any whole-script confusable. For example, the identifier "ѕсоре" using 
> Cyrillic characters would pass the single-script test described in Section 
> 5.2, Restriction-Level Detection, even though it is likely to be a spoof 
> attempt. 
> ---
> 
> So "Сгсе.ru" is suspicious in most contexts.

Right, but the functions we had back then didn't yet support that
part.

> > Regardless of what they are saying, I don't think the above is
> > suitable for production.  I think it should be enough to see whether
> > there could be confusion with the corresponding ASCII characters from
> > confusables.txt.
> 
> Yes, so that's what I've done now, but...  I'd feel slightly better if I
> knew what they were actually getting at.  I think they're saying that if
> "foo" is confusable with anything in any other scripts, then it's
> suspicious?

Yes, that's what they meant.

> But that sounds unworkeable.  For instance, "circle.ru" is
> confusable with "СігсӀе.ru", and perhaps it's suspicious to a Russian,
> but I don't see how to make a workable function from that.

They've left that to the implementation...

Anyway, I think confusable to ASCII is good enough for Emacs for now.

> So perhaps what I've implemented now is sufficient for domains.

I think it is, yes.  It definitely covers a very large chunk of the
problem.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]