bug-ncurses
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

UTF-8 multi-byte characters are not displayed properly on Windows consol


From: LIU Hao
Subject: UTF-8 multi-byte characters are not displayed properly on Windows consoles
Date: Thu, 12 Jan 2023 21:02:14 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

Hello Thomas E. Dickey,

Excuse me for the disruption. Thank you for your great work on ncurses. I'm writing to you because my message didn't arrive at the GNU mailing list; neither did it bounce. Maybe it's subscribers-only?

There seems to be an issue about UTF-8 strings in UTF-8 consoles on Windows 10. My original message follows. Hope it helps.

Have a nice day!


----- original message -----

Hello folks,

I'm mingw-w64 developer and MSYS2 contributor, and I maintain a GNU nano port to Windows [1]. First of all, thank you for the great work!

Since Windows 10, the Windows console has gained UTF-8 support, which however has to be enabled explicitly in system control panel. After UTF-8 support has been enabled and the UTF-8 code page has been set up with the `chcp 65001` command, all standard C ctype functions can work on UTF-8 strings.

However, when GNU nano attempts to display a UTF-8 string, it is taken bytewise and becomes gibberish. I have created this testcase, for example:

    ```
    #include <ncursesw/ncurses.h>

    int
    main(void)
      {
        initscr();
        addstr("»·");  // hex: C2 BB C2 B7
        refresh();
        getch();
      }
    ```

The commented string literal contains two characters as four bytes. On Linux it is displayed properly, but on a Windows UTF-8 console I get `»·`. How should I fix it?


[1] https://github.com/lhmouse/nano-win


--
Best regards,
LIU Hao

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]