bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: iswprint() and wcwidth() don't work properly on some platforms with


From: Bruno Haible
Subject: Re: iswprint() and wcwidth() don't work properly on some platforms with certain unicodes
Date: Fri, 31 Aug 2018 19:42:55 +0200
User-agent: KMail/5.1.3 (Linux/4.4.0-130-generic; KDE/5.18.0; x86_64; ; )

Simon Kobyda wrote:
> It seems that functions gnulib's functions iswprint() and
> wcwidth() return different results on different platforms.
> 
> Code on Fedora 28:
> 
>     wchar_t c = L'😀' ;
>     if (iswprint(c))
>         printf("Printable\n");
>     else
>         printf("Not printable\n");
> 
> Output: "Printable"
> 
> Code on CentOS 7:
> 
>     wchar_t c = L'😀' ;
>     if (iswprint(c))
>         printf("Printable\n");
>     else
>         printf("Not printable\n");
> 
> Output: "Not Printable"
> 
> Similar problems are encountered on also on Freebsd 11.
> 
> It seems that certain unicodes, such as zero-width characters and
> emojis are considered nonprintable by iswprint() function, but unicodes
> such as 稱 pass correctly.

Yes. This particular character (U+1F600) was added in Unicode 6.1 [1][2].

The iswprint() function is implemented in the libc, which is why you see
differences across platforms. After a new Unicode release is made, it
takes some time until the picks it up.

Then, it takes some time until the distros pick up the new glibc release.

Fedora 28 uses glibc 2.27, released in 2018.
CentOS 7, like RHEL 7, uses glibc 2.17, released in 2012.

gnulib adds basic Unicode support when that is missing from the platforms
(e.g. wcwidth(0x3000)), but we don't make an effort to support the most
recent Unicode standards, because that would be a lot of work for something
that the platforms themselves will be doing.

> Also when these unicodes are throw at
> wcwidth(), it returns incorrect width for these unicodes, but it might
> be because of the fact that these unicodes are considered unprintable

Yes, wcwidth relies on iswprint.

To get the behaviour you want, you may try to force the wcwidth replacement
which is based on (currently) Unicode 9.0.0. To do so, set the environment
variable
  gl_cv_func_wcwidth_works=no
at configure time.

Gnulib does not have an iswprint replacement at this point; so you might
change your code to use  'wcwidth(wc) >= 0'  instead of  'iswprint(wc)'.

Bruno

[1] http://www.unicode.org/versions/Unicode6.1.0/
[2] https://www.unicode.org/charts/PDF/Unicode-6.1/U61-1F600.pdf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]