emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode confusables and reordering characters considered harmful, a


From: Eli Zaretskii
Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution
Date: Sat, 06 Nov 2021 12:56:12 +0200

> From: Daniel Brooks <db48x@db48x.net>
> Cc: Eli Zaretskii <eliz@gnu.org>,  cpitclaudel@gmail.com,  Stefan Kangas
>  <stefan@marxist.se>,  emacs-devel@gnu.org,  monnier@iro.umontreal.ca,
>   yuri.v.khan@gmail.com
> Date: Fri, 05 Nov 2021 17:54:37 -0700
> 
> > #define is_restricted_user(user)                          \
> >   !strcmp (user, "root") ? 0 :                                    \
> >   !strcmp (user, "admin") ? 0 :                                   \
> >   !strcmp (user, "superuser‮⁦? 0 : 1⁩ ⁦")
> 
> I love this example.

Well, then maybe you'll also like the solution I just installed.

> I think that it can be detected though. As the paper says, we should be
> on the lookout for unterminated overrides. This example has a
> LEFT-TO-RIGHT ISOLATE that is left unterminated by a POP DIRECTIONAL
> ISOLATE; it thus applies long enough to hit the string delimiter.

No, this example (and others as well) will display the same even if
all the embeddings and isolates are terminated by the corresponding
POP controls.  In fact, the test case I installed does just that.  As
I write elsewhere, the UBA says that unterminated embeddings and
overrides are perfectly legitimate.  So the search for "unterminated"
overrides and isolates cannot be the solution, it can only detect the
cases where the malicious parties got sloppy.

> Personally I don’t mind detecting these sorts of errors, as long as we
> recognize that we cannot reliably do so unless we also know the syntax
> of the language; not every language terminates a string the same
> way. Imagine this were Perl, and we were manipulating not a
> double–quoted string but a q{}, a qx{}, or worse: a regex match
> (m//). Recall that regex matches can use arbitrary punctuation
> characters as delimiters; m[] is just as valid as m//.

I don't see how this is relevant, as long as the detection doesn't
care about the syntax, and just looks at the characters whose
bidirectional properties are being tweaked.  The parties that concoct
these malicious code samples do indeed have to consider the syntax of
the language, since they want to dupe human readers and also avoid
compiler flagging the source as invalid.  But detection doesn't have
to know anything about the syntax, at least not for some class of
detection algorithms.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]