bug-ncurses
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogate pairs for addwstr?


From: Bill Gray
Subject: Re: Surrogate pairs for addwstr?
Date: Mon, 11 Oct 2021 13:40:37 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0

On 10/11/21 12:05 AM, Tim Allen wrote:
On Sun, Oct 10, 2021 at 11:38:22AM -0400, Bill Gray wrote:
    The other way to put this would be to ask : if you're on a
system with 32-bit wchar_ts,  what should happen for this line?

   mvaddwstr( 0, 2, L"\xd83d\xdd1e Treble clef with a surrogate pair");

Honestly, what I'd *expect* to happen is a compile-time or run-time
error.

   As you thought,  it can't be a compile-time error in C,
because the string is not necessarily a Unicode one;  other
locales are supported.

   The run-time error is an interesting thought.  At least in
PDCurses,  addch() only fails if it can't scroll.  I could
imagine a "couldn't render that character" error condition
as well.  In PDCurses,  that would occur within waddch() and
then cause waddstr() to return ERR.

   I would still be arguing in favor of handling surrogates in
all cases,  but your point about them still not being handled
elsewhere changed my mind.  That's a tougher hurdle to get around.

   And thanks for the WCHAR_MAX == 65535 pointer.  I can't see
why that wouldn't work.

   In re "just use UTF8" : agreed,  yet another good reason to
do so.

Thanks!        -- Bill

Printing gibberish is never particularly helpful, but encouraging people
to assume wide-string literals (or wide-strings in general) use UTF-16
encoding seems like a bad idea. Sure, you can make it work transparently
for curses, but there's other libraries (like libc) that are likely to
get tripped up, and that seems like a foot-gun waiting to happen. Even
if you provide a utf16towcs() helper, people are going to forget to call
it since the input and output types are both wchar_t*.

The absolute simplest and safest thing a portable program could do is to
restrict itself to the Basic Multilingual Plane. The second simplest and
safest thing would probably be to store strings as UTF-8 (narrow) string
literals, and provide some kind of utf8stowcs() that decodes to UTF-16
or to UTF-32 depending on the value of WCHAR_MAX.


Tim.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]