[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: why is MB_LEN_MAX so large (16) on glibc
From: |
Eric Blake |
Subject: |
Re: why is MB_LEN_MAX so large (16) on glibc |
Date: |
Wed, 13 May 2015 19:29:47 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 |
On 05/13/2015 06:30 PM, Bruno Haible wrote:
> The value of 4 is sufficient to accommodate all stateless encodings in
> use, including UTF-8 (which was restricted from max. 6 to 4 bytes by
> an ISO standard) and GB18030. But it's not necessarily future-proof.
>
>> I was worried that it implied that wctomb() might convert a wide char to
>> _multiple_ encoded chars
>> for some character/encoding combinations?
On Cygwin, where wchar_t is 2 bytes, we have the opposite problem - any
character not in the basic plane of Unicode (that is, > 0xffff) requires
two surrogate pair wchar_t to represent a single character; which
violates the POSIX premise that wchar_t holds a character. It makes for
some odd behavior with wctomb() and friends, but it's the best that can
be done.
If the C11 char16_t and char32_t take off (with the according explosion
in function interfaces), then switching the world to char32_t instead of
wchar_t would be the sane approach for dealing with wide characters.
But I don't know if that is likely to happen.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature