bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#56237: 29.0.50; delete-forward-char fails to delete character


From: Visuwesh
Subject: bug#56237: 29.0.50; delete-forward-char fails to delete character
Date: Mon, 27 Jun 2022 11:17:25 +0530
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)

[திங்கள் ஜூன் 27, 2022] Visuwesh wrote:

> [ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:
>
>>> From: Visuwesh <visuweshm@gmail.com>
>>> Cc: 56237@debbugs.gnu.org
>>> Date: Sun, 26 Jun 2022 22:36:31 +0530
>>> 
>>> > Invoke find-composition, and you will see that it returns a single
>>> > composition there.
>>> 
>>> If find-composition is indeed right, then the return value is very
>>> unintuvitive as a native speaker: ப் and போ are two separate characters
>>> and combining them into a single cluster is weird...  
>>
>> Maybe you are right, but then Someone(TM) will have to either modify
>> find-composition or explain how to interpret its return value
>> differently from what we do now.  What is now in delete-forward-char
>> expresses my level of knowledge in this area, which admittedly is
>> limited.
>>
>
> Turns out that Someone™ was closer to us than I thought: describe-char.
> With a bit of edebug and reading the code in composition.h (for the
> LGLYPH_* macros) and defsubst's in composite.el, I think I figured out
> the logic:
>
> We need to call find-composition with a non-nil DETAIL-P argument to get
> the gstring.  The gstring contains the glyphs that will be used to
> construct the grapheme cluster [1].  According to composition.h, those
> glyphs which have the same FROM and TO indices are part of the same
> grapheme cluster so to get the actual length of individual codepoints,
> we need to calculate the number of glyphs which have an equal FROM and
> TO indices.
>
> Understanding all this, I came up with the following code:
>
>     (let* ((composition (find-composition 0 nil "ப்போ" t))
>            (gstring (nth 2 composition))
>            (num-glyphs (lgstring-glyph-len gstring))
>            (i 1)
>            (from (lglyph-from (lgstring-glyph gstring 0)))
>            (to (lglyph-to (lgstring-glyph gstring 0))))
>       (while (and (< i num-glyphs)
>                   (= from (lglyph-from (lgstring-glyph gstring i)))
>                   (= to (lglyph-to (lgstring-glyph gstring i))))
>         (setq i (1+ i)))
>       i)
>
> here i is the number of characters we need to delete using delete-char.
>
> [1] For the gstring format, see composition-get-gstring.
>
> But I think we should test this code in cases where a grapheme cluster
> contains more than two codepoints since all the composed characters in
> Tamil are made up of two Unicode codepoints.  I can't test it on emojis
> since I don't know of an Emoji font that won't crash potentially Xft and
> has enough coverage.
>

I got my hopes too high.  :(

This fails for the simple case of ரு (C-u C-x = also fails!) so I guess
we are back to square one.  Although ரு is composed from 0BB0 0BC1, the
gstring only has one glyph.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]