Re: ucs-normalize and diacritics

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ucs-normalize and diacritics

From:	Eli Zaretskii
Subject:	Re: ucs-normalize and diacritics
Date:	Thu, 26 Jul 2018 22:47:26 +0300

> From: Robert Pluim <address@hidden>
> Cc: address@hidden
> Date: Wed, 25 Jul 2018 21:59:00 +0200
> 
> Eli Zaretskii <address@hidden> writes:
> 
> >> From: Robert Pluim <address@hidden>
> >> Cc: address@hidden
> >> Date: Wed, 25 Jul 2018 09:40:34 +0200
> >> 
> >> I think Iʼll start by putting pointers to auto-composition-mode in the
> >> manual and lispref.
> >
> > Thanks in advance.
> 
> Hereʼs a first stab at it, intended for emacs-26.

Thanks.

> Probably the unicode characters below will not survive intact.

They didn't, and their chance to display correctly in all the formats
we want to be able to produce the manual are slim.  My suggestion is
not to use lone diacritics at all, but instead use their Unicode
names, such as "COMBINING CIRCUMFLEX ACCENT" and maybe the codepoint,
as in "u+0302 COMBINING CIRCUMFLEX ACCENT".  Or maybe avoid showing
them entirely, see below.

A few more comments:

> address@hidden diacritic
> address@hidden composition

You are describing a command, auto-composition-mode, so there should
be an @findex entry for it here.

> +  Sometimes Emacs will display a single character even when the buffer
> +contains multiple characters, through a process known as @dfn{composition}.

"Display a single character" is in general inaccurate (although in the
case you give as example that is what happens).  Character composition
is not about characters, it's about "grapheme clusters".  But I think
it's too technical an issue to describe in the user manual, so I
suggest instead to say something simplified, like

  @cindex complex text layout
    Emacs supports @dfn{complex text layout} (abbreviated
  @acronym{CTL}), where several consecutive characters are displayed
  as a single unit, either a single font glyph or several glyphs whose
  shapes and relative positions are determined by the rules of the
  script to which the characters belong.  This happens automatically
  when @code{auto-composition-mode} is turned on (which is the default).

We could give a couple of examples before the last sentence, like
Arabic shaping and Latin characters with diacritics.

> +This is done via @code{auto-composition-mode}, which is enabled by default,
> +and can only be done if the characters to be composed all exist within
> +the same font.

Please try to avoid passive tense.  In this case:

  Emacs can only compose characters that have glyphs in the same font.

> +       The exact rules for which characters to compose are
> +defined by the Unicode standard, but generally they concern
> +diacritical marks such as accents.

Diacritics are a special case; character composition is a much more
general feature.  So I would lose the second part of this sentence, or
rephrase it as an example.

> +  For a successfully composed character, @kbd{C-u C-x =} displays
> +details about the base character and the following character(s) it is
> +composed with.  For example for @samp{e} composed with @samp{COMBINING
> +CIRCUMFLEX ACCENT}, which visually would be very similar to the
> +previous example, the output would look like:

I would suggest to avoid the detailed display (especially as it most
probably won't give good results in PDF and perhaps even HTML), but
instead just mention that the information about the composed
characters is displayed as part of "C-u C-x =".  The details are very
technical and IMO inappropriate for the user manual.  (OTOH, they
should be described in the ELisp manual, or at least we should explain
how to interpret them.)

>  @item canonical-combining-class
>  Corresponds to the @code{Canonical_Combining_Class} Unicode property.
>  The value is an integer.  For unassigned codepoints, the value
> -is zero.
> +is zero.  Emacs can use this to visually compose multiple characters,
> +using @code{auto-composition-mode}, if all the characters concerned
> +exist in the same font.

Again, this is just one special case of character compositions, and
not the most important one.  So I wonder what would be the value of
this text, unless we have a much more detailed and full description
elsewhere, and this text includes a cross-reference to those details.

[Prev in Thread]

Current Thread

[Next in Thread]

ucs-normalize and diacritics, Robert Pluim, 2018/07/24
- Re: ucs-normalize and diacritics, Eli Zaretskii, 2018/07/24
  - Re: ucs-normalize and diacritics, Eli Zaretskii, 2018/07/24
    - Re: ucs-normalize and diacritics, Robert Pluim, 2018/07/24
    - Re: ucs-normalize and diacritics, Eli Zaretskii, 2018/07/24
    - Re: ucs-normalize and diacritics, Robert Pluim, 2018/07/25
    - Re: ucs-normalize and diacritics, Eli Zaretskii, 2018/07/25
    - Re: ucs-normalize and diacritics, Robert Pluim, 2018/07/25
    - Re: ucs-normalize and diacritics, Eli Zaretskii <=
    - Re: ucs-normalize and diacritics, Robert Pluim, 2018/07/25
    - Re: ucs-normalize and diacritics, Eli Zaretskii, 2018/07/25
    - Re: ucs-normalize and diacritics, Richard Stallman, 2018/07/25
    - Re: ucs-normalize and diacritics, Matt Lavallee, 2018/07/26
    - Re: ucs-normalize and diacritics, Eli Zaretskii, 2018/07/26
    - Re: ucs-normalize and diacritics, John Hsieh, 2018/07/26
    - Re: ucs-normalize and diacritics, Stefan Monnier, 2018/07/26
    - Re: ucs-normalize and diacritics, Richard Stallman, 2018/07/27
    - Re: ucs-normalize and diacritics, John Hsieh, 2018/07/30
    - Re: ucs-normalize and diacritics, Eli Zaretskii, 2018/07/31

Prev by Date: Re: ucs-normalize and diacritics
Next by Date: Re: catching keyboard-quit from read-char
Previous by thread: Re: ucs-normalize and diacritics
Next by thread: Re: ucs-normalize and diacritics
Index(es):
- Date
- Thread