bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #60566] gpinyin: puts tone mark on the wrong vowel in syllabic vowe


From: G. Branden Robinson
Subject: [bug #60566] gpinyin: puts tone mark on the wrong vowel in syllabic vowel clusters
Date: Sun, 9 May 2021 16:15:50 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0

URL:
  <https://savannah.gnu.org/bugs/?60566>

                 Summary: gpinyin: puts tone mark on the wrong vowel in
syllabic vowel clusters
                 Project: GNU troff
            Submitted by: gbranden
            Submitted on: Sun 09 May 2021 08:15:48 PM UTC
                Category: Preprocessor - others
                Severity: 3 - Normal
              Item Group: Incorrect behaviour
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None

    _______________________________________________________

Details:

gpinyin appears to always put the tone mark on the last vowel in a syllabic
vowel cluster.

Sadly the rule isn't that simple.

https://web.mit.edu/jinzhang/www/pinyin/spellingrules/index.html

For instance, gpinyin produces "chaōshì" when "chāoshì" is correct.

At first I thought this would be a big pain in the ass to fix, requiring a
refactor of the code to track more state, but then I noticed--Bernd laid a
useful foundation for a different approach.

gpinyin's subs.pl has a big list of "all" of the Mandarin syllables (without
tone marks).  Puzzlingly, this is in fact a hash rather than a list, but the
value of _every_ key is simply the integer 1.  (Maybe Bernd assumed a hash
would be faster for lookups--all he ever does is an existence test for a
keys.)

But whereas if I'd been reviewing his code at the time, I'd have suggested
that %syllables was thus overdesigned or prematurely optimized, today it means
we can adapt it to a useful purpose: storage of an indicator telling us the
vowel to which the tone mark should be applied.

So my proposed solution is a grind through the ~411 hash keys, applying the
rules from the site above, and recording the finding in the hash values
somehow.  Many of the syllables have only one vowel, so they can be skipped or
left with some default value.

I'm not decided yet on how to encode the requisite information.  One method
would be simply to record a string offset into the syllable key for where the
tone should go.  This would affect all of the syllables.  Bernd already has
logic for locating vowels within syllables, however.  The
interesting/challenging parts of the problem are the syllables with multiple
vowels.  So, instead of a "string offset", maybe a "vowel offset" should be
recorded.

More ambitiously, but perhaps excessively so, the syllables could be
categorized according to the rules (some encoding of "first vowel medial", for
example), and then the correct thing done in logic later.  I do suspect this
is overkill.

I'm not working on this yet--apparently it turns out that most readers of
Pinyin not only figure out the correct reading of the syllable when the tone
mark is misplaced, but they often do so by instinct through familiarity (in
much the same process that one overlooks typos).  And gpinyin has much worse
problems that need to be solved first, and for which I have fixes in various
stages of progress.

Nevertheless, typography is an exacting art and the thought of our beloved
groff system serving up the equivalent of a child's scrawl in Pinyin is
repugnant to me.

Our output should be exemplary.




    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?60566>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]