bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#39799: 28.0.50; Most emoji sequences don’t render correctly


From: Robert Pluim
Subject: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 12:34:45 +0200

>>>>> On Tue, 21 Sep 2021 12:16:38 +0300, Eli Zaretskii <eliz@gnu.org> said:

    >> From: Robert Pluim <rpluim@gmail.com>
    >> Cc: rgm@gnu.org,  39799@debbugs.gnu.org,  mfabian@redhat.com
    >> Date: Mon, 20 Sep 2021 22:38:28 +0200
    >> 
    >> Iʼve just pushed a change to master that should fix (almost) all the
    >> issues with displaying emoji sequences (except for keycaps). Feedback
    >> welcome.

    Eli> Thanks, this is mostly okay, IMO.  the only issue I have with this is
    Eli> here:

    Eli> Specifically, the U+2xxx codepoints are now in the 'emoji' script,
    Eli> which I think is undesirable, even if the price is that we won't
    Eli> support the sequences in which those codepoints are followed by
    Eli> VS-16.  So I think we should remove those codepoints from the above,
    Eli> leaving only the U+1Fxxx" ones.

OK, Iʼll adjust it.

    Eli> Btw, currently U+261D followed by VS-16 doesn't compose for me,
    Eli> probably because compose-gstring-for-variation-glyph is hardcoded to
    Eli> work only for Han characters, and U+261D isn't, or because that
    Eli> function is not suited to VS-16 (it looks for glyph variations in the
    Eli> font)?  Or am I missing something?

You mean it doesnʼt get treated as a composition, or the result looks
bad (despite the comments in compose-gstring-for-variation-glyph I
donʼt see it limiting things to Han anywhere)? I have the latter:

☝️

             position: 146 of 147 (99%), column: 0
            character: ☝ (displayed as ☝) (codepoint 9757, #o23035, #x261d)
              charset: unicode (Unicode (ISO10646))
code point in charset: 0x261D
               script: emoji
               syntax: w        which means: word
             category: .:Base
             to input: type "C-x 8 RET 261d" or "C-x 8 RET WHITE UP POINTING 
INDEX"
          buffer code: #xE2 #x98 #x9D
            file code: #xE2 #x98 #x9D (encoded by coding system utf-8-unix)
              display: composed to form "☝️" (see below)

Composed with the following character(s) "️" using this font:
  ftcrhb:-GOOG-Noto Color Emoji-normal-normal-normal-*-19-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 1 9757 69 24 0 24 18 5 nil]
with these character(s):
  ️ (#xfe0f) VARIATION SELECTOR-16

Character code properties: customize what to show
  name: WHITE UP POINTING INDEX
  general-category: So (Symbol, Other)
  decomposition: (9757) ('☝')

There are text properties here:
  fontified            nil

    Eli> Now to my idea of supporting those "U+2xxx VS-16" sequences without
    Eli> assigning them to the 'emoji' script:

    Eli> The function autocmp_chars uses font_range to find whether the
    Eli> sequence of characters that can be composed are supported by the same
    Eli> font.  It currently takes the first character of the sequence, calls
    Eli> font_for_char for it, then checks that all the rest of the characters
    Eli> are supported by that font by calling font_encode_char.  In our case,
    Eli> the first character of the sequence is U+2xxx, which is not in the
    Eli> 'emoji' script, so Emacs is likely to pick up a font that doesn't
    Eli> support Emoji, and the composition will fail.  To avoid that, I
    Eli> propose the following change:

    Eli>   . add a new argument to font_range, the codepoint that triggered the
    Eli>     composition
    Eli>   . inside font_range, if that codepoint belongs to the 'emoji' script
    Eli>     (use char-script-table to find that out), call font_for_char with
    Eli>     a representative character for 'emoji' (from
    Eli>     script-representative-chars) instead of the first character of the
    Eli>     sequence, then check that all the sequence characters, including
    Eli>     the first one, can be supported by that font; if they can, return
    Eli>     that font to the caller, to be used for the composition

    Eli> WDYT?

I think this means you'd have to add the Variation Selectors to the
emoji script, but it should work. Iʼm not sure that *all* the
characters need to be supported by the font: if thereʼs a ZWJ in
there, itʼs purely functional, so thereʼs no need for a glyph for it
(and Iʼm hoping harfbuzz agrees), but thatʼs a moot point for U+2xxx U+FE0F

    Eli> Btw, if you use Firefox or Chrome, or some other application that can
    Eli> show Emoji sequences, or maybe just use HarfBuzz's hb-view, how does
    Eli> the display of the U+2xxx changes when they are followed by VS-16?  Is
    Eli> the change prominent enough for us to try to support it?  If not,
    Eli> perhaps the above should be left out for the moment.

At least with chromium, the glyph becomes more colourful for about a
dozen codepoints, but not for U+261D (see attached). The VS-16 itself
is hidden.

Robert
--

Attachment: Screenshot from 2021-09-21 12-26-36.png
Description: PNG image

Attachment: Screenshot from 2021-09-21 12-26-11.png
Description: PNG image

Attachment: Screenshot from 2021-09-21 12-25-39.png
Description: PNG image


reply via email to

[Prev in Thread] Current Thread [Next in Thread]