[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: u32_normalize UNINORM_NFKC on 0xD800
From: |
Bruno Haible |
Subject: |
Re: u32_normalize UNINORM_NFKC on 0xD800 |
Date: |
Fri, 27 May 2011 01:49:25 +0200 |
User-agent: |
KMail/1.9.9 |
Simon Josefsson wrote:
> I'm doing some Unicode NFKC operations and noticing that u32_normalize
> fails for U+D800.
This is a valid behaviour, because U+D800 is a "surrogate" point code
and therefore not a valid character code point.
See the Unicode standard, chapter 2 [1], pages 23..24:
Surrogate code points and other non-character code points "should never be
interchanged". This means, for libunistring, that they are invalid input
and invalid output in all functions taking or returning UTF-32 strings or
UTF-8 strings.
Character code points and code points that are in regions that may be assigned
in future Unicode versions must not be rejected; these are valid input.
Bruno
[1] http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf
--
In memoriam Jeane Gardiner <http://en.wikipedia.org/wiki/Jeane_Gardiner>