bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#55370: [PATCH] Add support for the Syloti Nagri script


From: Eli Zaretskii
Subject: bug#55370: [PATCH] Add support for the Syloti Nagri script
Date: Thu, 12 May 2022 19:29:23 +0300

> From: समीर सिंह Sameer Singh <lumarzeli30@gmail.com>
> Date: Thu, 12 May 2022 20:36:49 +0530
> Cc: 55370@debbugs.gnu.org
> 
> For example in tirhuta, when I do this:
> 
> ;; Tirhuta composition rules
> (let ((consonant            "[\x1148F-\x114AF]")
>       (nukta                "\x114C3")
>       (independent-vowel    "[\x11481-\x1148E]")
>       (vowel                "[\x114B0-\x114BE]")
>       (nasal                "[\x114BF\x114C0]")
>       (virama               "\x114C2"))
>   (set-char-table-range composition-function-table
>                         '(#x114B0 . #x114BE)
>                         (list (vector
>                                ;; Consonant based syllables
>                                (concat consonant nukta "?\\(?:" virama
> consonant nukta "?\\)*\\(?:"
>                                        virama "\\|" vowel "*" nukta "?"
> nasal "?\\)")
>                                1 'font-shape-gstring))))
> 
> Notice here, the nasal sign is not included in the range.
> And then I type: 𑒅𑓀 𑒆𑒿
> It is rendered correctly

It is rendered correctly because your rule isn't used.

The rule

                        '(#x114B0 . #x114BE)
                        (list (vector
                               ;; Consonant based syllables
                               (concat consonant nukta "?\\(?:"
                                       virama consonant nukta "?\\)* \\(?:"
                                       virama "\\|" vowel "*" nukta "?"
                                       nasal "?\\)")
                               1 'font-shape-gstring))))

says this:

  . find a character C between #x114B0 and #x114BE
  . see if the characters starting one character before C match the
    above regexp
  . if they match, compose them

But your text doesn't include any characters in the range
[\x114B0-\x114BE], so the above rule will never match anything, and
will not cause any composition.

You see the characters composed because the second character in each
par, #x114C0 and #x114BF, is a combining accent, and for those we have
a catch-all rule in composite.el:

  (when unicode-category-table
    (let ((elt `([,(purecopy "\\c.\\c^+") 1 compose-gstring-for-graphic]
                 [nil 0 compose-gstring-for-graphic])))
      (map-char-table
       #'(lambda (key val)
           (if (memq val '(Mn Mc Me))
               (set-char-table-range composition-function-table key elt)))
       unicode-category-table))


> But when I do:
> 
> ;; Tirhuta composition rules
> (let ((consonant            "[\x1148F-\x114AF]")
>       (nukta                "\x114C3")
>       (independent-vowel    "[\x11481-\x1148E]")
>       (vowel                "[\x114B0-\x114BE]")
>       (nasal                "[\x114BF\x114C0]")
>       (virama               "\x114C2"))
>   (set-char-table-range composition-function-table
>                         '(#x114B0 . #x114C0)
>                         (list (vector
>                                ;; Consonant based syllables
>                                (concat consonant nukta "?\\(?:" virama
> consonant nukta "?\\)*\\(?:"
>                                        virama "\\|" vowel "*" nukta "?"
> nasal "?\\)")
>                                1 'font-shape-gstring))))
> The range now has the nasal signs.
> And then type the above characters: 𑒅𑓀 𑒆𑒿
> They are not rendered correctly

In this case, the characters that trigger examination of the
composition rules, #x114C0 and #x114BF, _are_ in the range
'(#x114B0 . #x114C0).  However, the preceding characters, #x11484 and
#x11486, are independent-vowel's, and there are no independent-vowel
in the regexp.  So again, the rules will never match.  Except that now
you also replaced the default rule we have for the combining accents,
so what worked before no longer does.

> But when I include their composition rules:
> 
> ;; Tirhuta composition rules
> (let ((consonant            "[\x1148F-\x114AF]")
>       (nukta                "\x114C3")
>       (independent-vowel    "[\x11481-\x1148E]")
>       (vowel                "[\x114B0-\x114BE]")
>       (nasal                "[\x114BF\x114C0]")
>       (virama               "\x114C2"))
>   (set-char-table-range composition-function-table
>                         '(#x114B0 . #x114C0)
>                         (list (vector
>                                ;; Consonant based syllables
>                                (concat consonant nukta "?\\(?:" virama
> consonant nukta "?\\)*\\(?:"
>                                        virama "\\|" vowel "*" nukta "?"
> nasal "?\\)")
>                                1 'font-shape-gstring)
>                               (vector
>                                ;; Nasal vowels
>                                (concat independent-vowel nasal "?")
>                                1 'font-shape-gstring))))
> 
> They are now once more rendered correctly.

As expected, see above: now you do have a regexp that can match, it's
this one:

    (concat independent-vowel nasal "?")

I hope you now understand how to fix the rules.  If not, please ask
more questions and show more examples.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]