[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Locale-independent paragraph formatting
From: |
Gavin Smith |
Subject: |
Re: Locale-independent paragraph formatting |
Date: |
Fri, 10 Nov 2023 19:48:04 +0000 |
On Fri, Nov 10, 2023 at 08:47:10AM +0200, Eli Zaretskii wrote:
> > Does anybody know if we could just write 'a' instead of U'a' and rely
> > on it being converted?
> >
> > E.g. if you do
> >
> > char32_t c = 'a';
> >
> > then afterwards, c should be equal to 97 (ASCII value of 'a').
>
> Why not? What could be the problems with using this?
I think what was confusing me was the statement that char32_t held a UTF-32
encoded Unicode character. I then thought it would have a certain byte
order, so if the UTF-32 was big endian, the bytes would have the order
00 00 00 61, whereas the value 97 on a little endian machine would have
the order 61 00 00 00. However, it seems that UTF-32 just means the
codepoint is encoded as a 32-bit integer, and the endianness of the
UTF-32 sequence can be assumed to match the endianness of the machine.
The standard C integer conversions can be assumed to work when assigning
to/from char32_t because it is just an integer type, I assume.