[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode string literals
From: |
Paul Eggert |
Subject: |
Re: Unicode string literals |
Date: |
Fri, 1 May 2020 14:22:27 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 |
On 5/1/20 2:01 AM, Bruno Haible wrote:
> Did you mean (1) that the programmer shall define a macro, that indicates that
> their source code is UTF-8 encoded?
>
> Or did you mean (2) that gnulib shall define a macro, that shall _assume_ that
> the source code is UTF-8 encoded, and then expand to u8"xyz" instead of "xyz"?
Yes, I meant (2).
> For (2): what's the point? Once you assume that the source code is UTF-8
> encoded, ISO C11 section 6.4.5 says that u8"xyz" and "xyz" are the same:
> literals of type 'char *'.
I was thinking about the case where one develops and normally builds on systems
that assume UTF-8 source code (perhaps because a build system is old and just
compiles the bytes unchecked), but that on occasion a builder might translate
all the source code to (say) EUC-JP for whatever reason, and then compile on a
newer platform that supports the u8 prefix.
Admittedly the scenario is unlikely. I suppose we should wait until a real need
arises before worrying about it.
This all reminds me of trigraphs somehow
<https://en.wikipedia.org/wiki/Digraphs_and_trigraphs>. What a pain that was,
and still is.