[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: iswprint() and wcwidth() don't work properly on some platforms with
From: |
Eric Blake |
Subject: |
Re: iswprint() and wcwidth() don't work properly on some platforms with certain unicodes |
Date: |
Tue, 4 Sep 2018 11:58:52 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
On 08/31/2018 12:42 PM, Bruno Haible wrote:
Yes. This particular character (U+1F600) was added in Unicode 6.1 [1][2].
The iswprint() function is implemented in the libc, which is why you see
differences across platforms. After a new Unicode release is made, it
takes some time until the picks it up.
Then, it takes some time until the distros pick up the new glibc release.
Fedora 28 uses glibc 2.27, released in 2018.
CentOS 7, like RHEL 7, uses glibc 2.17, released in 2012.
gnulib adds basic Unicode support when that is missing from the platforms
(e.g. wcwidth(0x3000)), but we don't make an effort to support the most
recent Unicode standards, because that would be a lot of work for something
that the platforms themselves will be doing.
Indeed, whereas Unicode 11.0.0 is now released,...
Also when these unicodes are throw at
wcwidth(), it returns incorrect width for these unicodes, but it might
be because of the fact that these unicodes are considered unprintable
Yes, wcwidth relies on iswprint.
To get the behaviour you want, you may try to force the wcwidth replacement
which is based on (currently) Unicode 9.0.0. To do so, set the environment
...gnulib being at 9.0.0 can actually result in regressions if gnulib
replaces a libc function merely for being at a different version of Unicode.
variable
gl_cv_func_wcwidth_works=no
at configure time.
Yes, that works for a one-time per-machine override, for testing if
using gnulib-provided replacements (that force a particular Unicode
version, which may be newer or older than the libc's version) behave
sanely across multiple platforms. But it is not a wise idea to codify
that into libvirt's configure.ac (or any other project).
Rather, if libvirt is hitting test failures due solely to the difference
of Unicode version that the underlying libc complies with, it might be
better to rewrite the failing tests to instead use different Unicode
characters that were available since the oldest supported version of
Unicode across any platform being targetted by libvirt, instead of
testing the behavior of problematic characters that were only recently
added in newer Unicode.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
- Re: iswprint() and wcwidth() don't work properly on some platforms with certain unicodes,
Eric Blake <=