bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#51733: 27.1; Detect impossible email addresses better


From: Lars Ingebrigtsen
Subject: bug#51733: 27.1; Detect impossible email addresses better
Date: Mon, 17 Jan 2022 21:22:58 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)

I'm not quite sure I understand this bit here
https://www.unicode.org/reports/tr39/#Confusable_Detection

---
For an input string X, define skeleton(X) to be the following transformation on 
the string:

    Convert X to NFD format, as described in [UAX15].
    Concatenate the prototypes for each character in X according to the 
specified data, producing a string of exemplar characters.
    Reapply NFD.
---

I mean, that sounds OK in and of itself, but then:

---
 X and Y are single-script confusables if and only if they are confusable, and 
their resolved script sets have at least one element in common.

    Examples: “ljeto” and “ljeto” in Latin (the Croatian word for “summer”), 
where the first word uses only four codepoints, the first of which is U+01C9 
(lj) LATIN SMALL LETTER LJ.
---

But:

(ucs-normalize-NFD-string "ljeto")
=> "ljeto"

So according to that algo "ljeto" and "ljeto" are not confusable.

But if we use NFKD instead, they are:

(ucs-normalize-NFKD-string "ljeto")
=> "ljeto"

It seems unlikely to be a typo in this document, surely?  But NFKD seems
to make a whole lot more sense than NFD for this usage.  I must be
missing or misreading something.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






reply via email to

[Prev in Thread] Current Thread [Next in Thread]