emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode confusables and reordering characters considered harmful, a


From: Gregory Heytings
Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution
Date: Thu, 04 Nov 2021 10:41:41 +0000


If you could find an actual source code file in an actual project in which these characters are used with their intended purpose, it would be a pertinent example.

Why do you need me to find an actual source code which uses those controls? Isn't it clear that any human-readable text in comments and strings in a program's source code can and will use these controls? How does the tutorial text that explains technical stuff related to a computer program differ from what a programmer could wish to write in a comment or a string in his/her program?


From a theoretical point of view, that's correct. From a practical point
of view, if these controls characters are only found in 0.01% of the files that are hosted on, say, GitLab, and given that these controls can have a dangerous effect, it is reasonable for an editor to make them stand out. Just like Emacs makes no-break spaces stand out for example (although AFAIK they are not dangerous in any way), with a thin brown line.

Otherwise it is safe and reasonable to assume (as the Rust developers did) that the mere presence of these characters in source code files is a potential problem and must be flagged as such.

It's easy, that's sure. Reasonable it isn't. neither it's safe, because any user who does want these characters used legitimately will quickly turn off that warning for good.

So it works for the Rust developers to tick a checkbox, but it isn't a solution for the problem.


AFAIU the solutions you propose are:

1. Customize glyphless-char-display-control to display all control characters in a different way. This is a much cruder solution, it would also have an effect for example on ZWNJ which might be undesirable, and it is also not buffer-local. Users who want to use these characters legitimately are unlikely to use that solution.

2. Improve bidi-find-overridden-directionality to detect such non-legitimate cases. This has to be done.

In comparison, the minor-mode exists, it's a small patch, and it's orthogonal to the two solutions you propose.

Anyway, I think it is time to abandon all hope.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]