bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #64025] PATCH: Use the right BCP47 language format in XML mode for


From: Matthias Klumpp
Subject: [bug #64025] PATCH: Use the right BCP47 language format in XML mode for xml:lang
Date: Tue, 11 Apr 2023 06:53:25 -0400 (EDT)

URL:
  <https://savannah.gnu.org/bugs/?64025>

                 Summary: PATCH: Use the right BCP47 language format in XML
mode for xml:lang
                   Group: GNU gettext
               Submitter: matk
               Submitted: Tue 11 Apr 2023 10:53:24 AM UTC
                Category: None
                Severity: 3 - Normal
              Item Group: None
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any


    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: Tue 11 Apr 2023 10:53:24 AM UTC By: Matthias Klumpp <matk>
Hello!
This is a spicy issue, which is why I am sending in this preliminary patch (it
likely does need a test case to be merged).

I was recently investigating why AppStream components sometimes are properly
translated into the user's native language, and sometimes are not. The issue
came down to the translated XML data itself: Some XML files had locale in
their xml:lang tags in the form of "zh_TW" and some had "zh-TW". I looked it
up, and apparently the IETF mandates the BCP47 language format[1] (with a
dash) if the xml:lang tag is used. Some tools, like itstool, do this
correctly, and even msgfmt does use the right language if manually specified
(which some build systems seem to do), however any application using msgfmt in
bulk mode on XML during its localization process will end up with POSIX locale
strings in xml:lang tags instead.
This is what Meson does, so a huge amount of apps is affected now.

At the moment, looking just at AppStream data, we have a confusing mess of
different locale strings, depending how the build system of the individual
application was configured (pretty much all of KDE uses BCP47, while all of
GNOME uses POSIX for example).
While I originally wanted to ignore the issue and just continue to use POSIX
locale, the fact that some projects translate it one way and others another
way makes solving this issue for good impossible and it will never stop
haunting us until it is fixed for good. It is especially frustrating for
translators who have no means to do anything about it. Furthermore, there
actually exists a specification which mandates how xml:lang has to look like,
so the right solution(tm) would be to follow it.

Fortunately that is relatively easy to do for 99.9% of all cases.
The attached patch makes msgfmt use BCP47 for xml:lang in XML bulk mode,
which
makes it follow the IETF recommendation and match other tools such as
itstool. This patch implements the same conversion logic as used by itstool to
convert POSIX locale to BCP47, so we should have wide compatibility.

This is a disruptive change though - while some locale like "de" are the same
in POSIX and BCP47, so nobody will notice the change (which is also the reason
why we only noticed this now - thanks to our Chinese-speaking community for
bringing this to our attention!), others such as "sr@latin" will become
"sr-Latn" and "ca@valencia" will become "ca-vanlencia".

Feedback on this issue as well as the patch is very welcome!

Thank you & kind regards,
    Matthias Klumpp

[1]: https://en.wikipedia.org/wiki/IETF_language_tag






    _______________________________________________________
File Attachments:


-------------------------------------------------------
Date: Tue 11 Apr 2023 10:53:24 AM UTC  Name: gettext-bcp47_v1.patch  Size:
7KiB   By: matk
Preliminary patch
<http://savannah.gnu.org/bugs/download.php?file_id=54600>

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64025>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]