[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 multi-byte characters are not displayed properly on Windows co
From: |
Thomas Dickey |
Subject: |
Re: UTF-8 multi-byte characters are not displayed properly on Windows consoles |
Date: |
Sat, 14 Jan 2023 17:41:34 -0500 |
User-agent: |
Mutt/1.10.1 (2018-07-13) |
On Fri, Jan 13, 2023 at 12:59:15AM +0800, LIU Hao wrote:
> Thank you, Thomas.
>
> > This doesn't set the locale (which nano normally does).
> > Without setting a valid locale which uses UTF-8 encoding,
> > ncurses won't do much of any use.
> > setlocale(LC_ALL, "");
>
> Yeah I thought ncurses just write a byte string to the console without being
> aware of its encoding; apparently I was wrong.
no - it converts the byte-string into characters, stores characters in an
array (WINDOW), and uses that for deciding how to update the terminal.
When it updates the terminal, it (depending on encoding...) transforms
characters into byte-strings and writes _those_.
It's not just ncurses - see
https://pubs.opengroup.org/onlinepubs/7908799/xcurses/addstr.html
(though in either ncurses or X/Open, "characters" sometimes is ambiguous)
X/Open documents echochar
https://pubs.opengroup.org/onlinepubs/7908799/xcurses/echochar.html
which says that it echos a "single-byte character" (and iirc,
SVr4 claims that function writes directly to the screen),
but ncurses does not make those distinctions
-- adding a to-do to clarify
> For the record:
>
> `setlocale(LC_ALL, "")` does NOT set a UTF-8 locale on Windows. For
> example, on my system it
> sets up the conventional DBCS locale, and returns `Chinese
> (Simplified)_China.936`, even when
> UTF-8 is in effect. It is possible, however, to request a UTF-8 locale
> with `setlocale(LC_ALL,
> ".65001")` with UCRT. This has the desired effect, and now UTF-8 strings
> work as expected.
yes - Windows itself doesn't make locale encoding work as in POSIX,
which is why I qualified my comment (valid, encoding, etc).
I assumed that msys2 had a workaround :-)
> There is another issue: When nano is told to read text from standard input,
> it loses response as soon as input is over, like this:
>
> ```
> > echo | src\nano.exe -
>
> Reading data from keyboard; type ^D or ^D^D to finish.
> Too many errors from stdin
>
> Buffer written to nano.11200.save
> ```
>
> The cause of this issue is described in the attached patch. Please take a
> look if you happen to have some time.
...on my to-do list (thanks)
> Thanks for your help!
>
>
>
> --
> Best regards,
> LIU Hao
>
> diff --git a/ncurses-6.4.orig/ncurses/win32con/win32_driver.c
> b/ncurses-6.4/ncurses/win32con/win32_driver.c
> index 45aadf2f59..354015ba62 100644
> --- a/ncurses-6.4.orig/ncurses/win32con/win32_driver.c
> +++ b/ncurses-6.4/ncurses/win32con/win32_driver.c
> @@ -2213,14 +2196,23 @@ InitConsole(void)
> for (i = 0; i < NUMPAIRS; i++)
> CON.pairs[i] = a;
>
> - CON.inp = GetStdHandle(STD_INPUT_HANDLE);
> - CON.out = GetStdHandle(STD_OUTPUT_HANDLE);
> -
> b = AllocConsole();
>
> if (!b)
> b = AttachConsole(ATTACH_PARENT_PROCESS);
>
> + /* When the standard handles have been redirected (such as inside
> + * a text editor or the less utility), keystrokes must be read from
> + * the console rather than the redirected handle. The standard
> + * output handle suffers from a similar problem.
> + * Both handles are not closed once opened. The console shall be
> + * considered reachable throughout the process.
> + */
> + CON.inp = CreateFile(TEXT("CONIN$"), GENERIC_READ | GENERIC_WRITE,
> + FILE_SHARE_READ, 0, OPEN_EXISTING, 0, 0);
> + CON.out = CreateFile(TEXT("CONOUT$"), GENERIC_READ | GENERIC_WRITE,
> + FILE_SHARE_WRITE, 0, OPEN_EXISTING, 0, 0);
> +
> if (getenv("NCGDB") || getenv("NCURSES_CONSOLE2")) {
> T(("... will not buffer console"));
> buffered = FALSE;
--
Thomas E. Dickey <dickey@invisible-island.net>
https://invisible-island.net
signature.asc
Description: PGP signature