bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: why is MB_LEN_MAX so large (16) on glibc


From: Eric Blake
Subject: Re: why is MB_LEN_MAX so large (16) on glibc
Date: Wed, 13 May 2015 19:29:47 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

On 05/13/2015 06:30 PM, Bruno Haible wrote:

> The value of 4 is sufficient to accommodate all stateless encodings in
> use, including UTF-8 (which was restricted from max. 6 to 4 bytes by
> an ISO standard) and GB18030. But it's not necessarily future-proof.
> 
>> I was worried that it implied that wctomb() might convert a wide char to 
>> _multiple_ encoded chars
>> for some character/encoding combinations?

On Cygwin, where wchar_t is 2 bytes, we have the opposite problem - any
character not in the basic plane of Unicode (that is, > 0xffff) requires
two surrogate pair wchar_t to represent a single character; which
violates the POSIX premise that wchar_t holds a character. It makes for
some odd behavior with wctomb() and friends, but it's the best that can
be done.

If the C11 char16_t and char32_t take off (with the according explosion
in function interfaces), then switching the world to char32_t instead of
wchar_t would be the sane approach for dealing with wide characters.
But I don't know if that is likely to happen.


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]