emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Can watermarking Unicode text using invisible differences sneak thro


From: Richard Stallman
Subject: Re: Can watermarking Unicode text using invisible differences sneak through Emacs, or can Emacs detect it?
Date: Sat, 05 Feb 2022 23:13:37 -0500

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I don't understand the specification of these functions.  How would
  > diacriticize decide/know that ?~ is equivalent to the ?̃ (U+0303
  > COMBINING TILDE) that is part of ?ã ?

You know more about Unicode than I do, so I'm sure it is true _in some
sense_ that "U+0303 (COMBINING TILDE) is part of ?ã".

But I have doubts that that particular sense is the one that is
pertinent to the job `diacriticize' is meant to do.

I think you mean that one can represent the glyph image `ã' in Unicode
as a composition using a sequence of `a' and COMBINING TILDE.  Please
tell me if I am mistaken.

The ã in this sentence is not a composition.  It is a single
Unicode character, which is also in Latin-1.  I don't think that
COMBINING TILDE is "part of it".

COMBINING TILDE can be used to create its glyph image by composition,
but as to what is graphically part of that glyph image, I think
that is ordinary `~'.

    the call (ucs-normalize-NFD-string "ã")
    returns a string of 2 characters, ?a and ?̃..

Interesting.  I think it would be easy to implement `diacriticize' with that.

                                                 But how do you propose
    to make the leap from ?̃ to ?~ ?



(defconst unicode-combining-chars-alist '(... (?~ . ?̃ ) ...))

... (car (rassq combining-char unicode-combining-chars-alist)) ...


Indeed, I think this does the job for `criticanalyze'.

(defun criticanalyze (char)
  (let* ((composition (ucs-normalize-NFD-string (char-to-string char)))
         charlist)
     (mapcar (lambda (c) (or (car (rassq c unicode-combining-chars-alist)) c))
                composition)))

There is probably an equally simple way to handle `diacriticize'.

I proposed those two functions because I thought we had no way
for Lisp programs to get info about this.  Since we already have one,
maybe we don't need those two functions.  Popping back to the question
of `latin1-display.el', it could use the `ucs-...' functions directly
to figure out what substitutions to make.

However, `ucs-normalize-NFD-string' does not know anything about
ligatures.  Given the fi ligature, it returns the fi ligature.  So it
can't be the sole method for `latin1-display' to find useful
substitutions.  We would have to tell it the list of ligatures.

It already uses `char-displayable-p' to determine at run time which
characters could use display substitutions.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]