emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Can watermarking Unicode text using invisible differences sneak thro


From: Eli Zaretskii
Subject: Re: Can watermarking Unicode text using invisible differences sneak through Emacs, or can Emacs detect it?
Date: Mon, 07 Feb 2022 15:16:11 +0200

> From: Richard Stallman <rms@gnu.org>
> Cc: psainty@orcon.net.nz, luangruo@yahoo.com, emacs-devel@gnu.org,
>       kevin.legouguec@gmail.com
> Date: Mon, 07 Feb 2022 00:11:28 -0500
> 
>   > So you mean we should create a database of ASCII characters that
>   > approximate the combining diacriticals?  But if so, how is it better
>   > than having a database of complete characters and their ASCII
>   > equivalents, like we have now in latin1-disp.el?
> 
> I think there are only around 20 diacritics.

You are thinking of some subset, I think.  The real number is more
like 80, and that's even if we only take the diacritics relevant to
Latin characters, and disregard the Cyrillic, Greek, and others.

> There must be hundreds of letters-with-diacritics.  The method I've
> proposed can handle everything automatically, given a table about
> the 20-odd diacritics.  That's a great simplification from a table
> of hundreds of elements, set up by hand.

Setting by hand was already done, and we have it in latin1-disp.el so
it isn't like we need to weigh 2 jobs one against the other.

>   >  but a database of complete characters makes it easier to
>   > make sure the results are optimal, because you see the original
>   > complete character and the complete equivalent,
> 
> I don't follow you here.  In particular, what does "complete
> equivalent" mean?

For example, "o?'" instead of "o" + "?" + "'" (to emulate ?\ṍ).  With
the former, you see the entire string that will be shown; with the
latter, you need to imagine it (and all the other combinations that
use one or both of these diacritics).

Also, characters that have two diacritics are just part of the
problem.  What would you do with the likes of ?\ǿ (which we currently
represent as "o/'")?  Its base character, ø, doesn't have a
decomposition in Unicode.

IOW, your proposal solves only some (small) part of the problem at
best, whereas having complete strings in the database is needed anyway
for the rest.

>   > I think reasonable appearance is more important than memory
>   > consumption in this case,
> 
> What makes an appearance more or less reasonable when we're talking
> about replacing one character with two or three that express
> _symbolically_ which character it is?  I don't get it.

The appearance should (a) make sense, and (b) be consistent: for
example, U+030C COMBINING CARON should always be represented by the
same ASCII equivalent.  I don't see how you could fulfill these two
conditions without reviewing all the relevant combinations and
iteratively fixing whatever needs fixing.

>   > (I used 'append' here to make it evident that the result of the
>   > decomposition is 2 characters, not one, since the Emacs display will
>   > by default combine them into the same glyph as the original non-ASCII
>   > character,
> 
> Not on a Linux console, I think.  When I have f and i in the buffer,
> Emacs does not convert them into a ligature.  The only time it has to
> try to deal with a ligature is when there is a Unicode ligature
> code point in the buffer.

Once again, on a TTY frame Emacs does NOT produce the ligatures nor
combine base characters with the diacritics, it expects the terminal
to do that.  I've written the above remark because you are not the
only one who reads this discussion, and most other people do use GUI
displays, where the characters would (potentially confusingly) combine
on display.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]