bug-ncurses
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 multi-byte characters are not displayed properly on Windows co


From: Thomas Dickey
Subject: Re: UTF-8 multi-byte characters are not displayed properly on Windows consoles
Date: Sat, 14 Jan 2023 17:41:34 -0500
User-agent: Mutt/1.10.1 (2018-07-13)

On Fri, Jan 13, 2023 at 12:59:15AM +0800, LIU Hao wrote:
> Thank you, Thomas.
> 
> > This doesn't set the locale (which nano normally does).
> > Without setting a valid locale which uses UTF-8 encoding,
> > ncurses won't do much of any use.
> >          setlocale(LC_ALL, "");
> 
> Yeah I thought ncurses just write a byte string to the console without being
> aware of its encoding; apparently I was wrong.

no - it converts the byte-string into characters, stores characters in an
array (WINDOW), and uses that for deciding how to update the terminal.

When it updates the terminal, it (depending on encoding...) transforms
characters into byte-strings and writes _those_.

It's not just ncurses - see

https://pubs.opengroup.org/onlinepubs/7908799/xcurses/addstr.html

(though in either ncurses or X/Open, "characters" sometimes is ambiguous)

X/Open documents echochar

https://pubs.opengroup.org/onlinepubs/7908799/xcurses/echochar.html

which says that it echos a "single-byte character" (and iirc,
SVr4 claims that function writes directly to the screen),
but ncurses does not make those distinctions
-- adding a to-do to clarify
 
> For the record:
> 
>    `setlocale(LC_ALL, "")` does NOT set a UTF-8 locale on Windows. For 
> example, on my system it
>    sets up the conventional DBCS locale, and returns `Chinese 
> (Simplified)_China.936`, even when
>    UTF-8 is in effect. It is possible, however, to request a UTF-8 locale 
> with `setlocale(LC_ALL,
>    ".65001")` with UCRT. This has the desired effect, and now UTF-8 strings 
> work as expected.

yes - Windows itself doesn't make locale encoding work as in POSIX,
which is why I qualified my comment (valid, encoding, etc).

I assumed that msys2 had a workaround :-)
 
> There is another issue: When nano is told to read text from standard input,
> it loses response as soon as input is over, like this:
> 
>    ```
>    > echo | src\nano.exe -
> 
>    Reading data from keyboard; type ^D or ^D^D to finish.
>    Too many errors from stdin
> 
>    Buffer written to nano.11200.save
>    ```
> 
> The cause of this issue is described in the attached patch. Please take a
> look if you happen to have some time.

...on my to-do list (thanks)
 
> Thanks for your help!
> 
> 
> 
> -- 
> Best regards,
> LIU Hao
> 

> diff --git a/ncurses-6.4.orig/ncurses/win32con/win32_driver.c 
> b/ncurses-6.4/ncurses/win32con/win32_driver.c
> index 45aadf2f59..354015ba62 100644
> --- a/ncurses-6.4.orig/ncurses/win32con/win32_driver.c
> +++ b/ncurses-6.4/ncurses/win32con/win32_driver.c
> @@ -2213,14 +2196,23 @@ InitConsole(void)
>       for (i = 0; i < NUMPAIRS; i++)
>           CON.pairs[i] = a;
>  
> -     CON.inp = GetStdHandle(STD_INPUT_HANDLE);
> -     CON.out = GetStdHandle(STD_OUTPUT_HANDLE);
> -
>       b = AllocConsole();
>  
>       if (!b)
>           b = AttachConsole(ATTACH_PARENT_PROCESS);
>  
> +     /* When the standard handles have been redirected (such as inside
> +      * a text editor or the less utility), keystrokes must be read from
> +      * the console rather than the redirected handle.  The standard
> +      * output handle suffers from a similar problem.
> +      * Both handles are not closed once opened.  The console shall be
> +      * considered reachable throughout the process.
> +      */
> +     CON.inp = CreateFile(TEXT("CONIN$"), GENERIC_READ | GENERIC_WRITE,
> +                          FILE_SHARE_READ, 0, OPEN_EXISTING, 0, 0);
> +     CON.out = CreateFile(TEXT("CONOUT$"), GENERIC_READ | GENERIC_WRITE,
> +                          FILE_SHARE_WRITE, 0, OPEN_EXISTING, 0, 0);
> +
>       if (getenv("NCGDB") || getenv("NCURSES_CONSOLE2")) {
>           T(("... will not buffer console"));
>           buffered = FALSE;





-- 
Thomas E. Dickey <dickey@invisible-island.net>
https://invisible-island.net

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]