[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode confusables and reordering characters considered harmful
From: |
Eli Zaretskii |
Subject: |
Re: Unicode confusables and reordering characters considered harmful |
Date: |
Thu, 04 Nov 2021 10:21:12 +0200 |
> From: Reini Urban <reini.urban@gmail.com>
> Date: Thu, 4 Nov 2021 08:50:14 +0100
> Cc: emacs-devel@gnu.org
>
> int hi = 5;
> int שָׁלוֹם = hi;
> int hello = 10;
> int السّلامعليك = hello;
> myfun(שָׁלוֹם ,السّلامعليكم)
>
> IMO this code is fundamentally valid: we should allow
> programmers to write identifiers in their native tongue.
>
> Sure, nobody wants to forbid unicode identifiers. The rules only ensure that
> identifiers keep identifiable.
> I converted itto perl (because I dislike java or rust), and ran it through
> cperl.
> The problem is that from an innocent look or code review you won't see any
> problem, hence the security
> risk.
> You need to adjust your tools.
>
> But the very first RTL identifier שָׁלוֹם contains already non-identifier
> characters.
Which of its characters are non-identifier, and why? That identifier
uses characters of a single script, AFAICT.
> So I cannot tell you if this code doesn't violate any of the 4 unicode mixed
> script profiles
> (http://www.unicode.org/reports/tr39/#Mixed_Script_Detection 2-5)
> Or if any of the unreadable characters are of the recommended scripts:
Which characters in that fragment are "unreadable" for this purpose?
- Re: [authors: default bidi-display-reordering is set to t] (was: Unicode confusables and reordering characters considered harmful), (continued)
Re: Unicode confusables and reordering characters considered harmful, Stefan Kangas, 2021/11/02
Re: Unicode confusables considered harmful, Vasilij Schneidermann, 2021/11/05
Re: Unicode confusables considered harmful, Stefan Monnier, 2021/11/05
Re: Unicode confusables and reordering characters considered harmful, Dmitry Gutov, 2021/11/10