bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: iswprint() and wcwidth() don't work properly on some platforms with


From: Eric Blake
Subject: Re: iswprint() and wcwidth() don't work properly on some platforms with certain unicodes
Date: Tue, 4 Sep 2018 11:58:52 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 08/31/2018 12:42 PM, Bruno Haible wrote:

Yes. This particular character (U+1F600) was added in Unicode 6.1 [1][2].

The iswprint() function is implemented in the libc, which is why you see
differences across platforms. After a new Unicode release is made, it
takes some time until the picks it up.

Then, it takes some time until the distros pick up the new glibc release.

Fedora 28 uses glibc 2.27, released in 2018.
CentOS 7, like RHEL 7, uses glibc 2.17, released in 2012.

gnulib adds basic Unicode support when that is missing from the platforms
(e.g. wcwidth(0x3000)), but we don't make an effort to support the most
recent Unicode standards, because that would be a lot of work for something
that the platforms themselves will be doing.

Indeed, whereas Unicode 11.0.0 is now released,...


Also when these unicodes are throw at
wcwidth(), it returns incorrect width for these unicodes, but it might
be because of the fact that these unicodes are considered unprintable

Yes, wcwidth relies on iswprint.

To get the behaviour you want, you may try to force the wcwidth replacement
which is based on (currently) Unicode 9.0.0. To do so, set the environment

...gnulib being at 9.0.0 can actually result in regressions if gnulib replaces a libc function merely for being at a different version of Unicode.

variable
   gl_cv_func_wcwidth_works=no
at configure time.

Yes, that works for a one-time per-machine override, for testing if using gnulib-provided replacements (that force a particular Unicode version, which may be newer or older than the libc's version) behave sanely across multiple platforms. But it is not a wise idea to codify that into libvirt's configure.ac (or any other project).

Rather, if libvirt is hitting test failures due solely to the difference of Unicode version that the underlying libc complies with, it might be better to rewrite the failing tests to instead use different Unicode characters that were available since the oldest supported version of Unicode across any platform being targetted by libvirt, instead of testing the behavior of problematic characters that were only recently added in newer Unicode.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]