[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: c32width gives incorrect return values in C locale
From: |
Gavin Smith |
Subject: |
Re: c32width gives incorrect return values in C locale |
Date: |
Sat, 11 Nov 2023 22:15:39 +0000 |
On Sat, Nov 11, 2023 at 09:06:41PM +0100, Bruno Haible wrote:
> [CCing bug-gnulib]
> Indeed, the c32* functions by design work only on those Unicode characters
> that can be represented as multibyte sequences in the current locale.
>
> I'll document this better in the Gnulib manual.
>
> Since you want texinfo to work on UTF-8 encoded text with characters outside
> the repertoire of the current locale, you'll need the libunistring functions,
> documented in
> <https://www.gnu.org/software/libunistring/manual/html_node/uniwidth_002eh.html>.
> Namely, replace c32width with uc_width.
Thanks, that seems to work perfectly.
I also changed c32isupper to uc_is_upper. The gnulib manual stated
(node "isupper"):
‘c32isupper’
This function operates in a locale dependent way, on 32-bit wide
characters. In order to use it, you first have to convert from
multibyte to 32-bit wide characters, using the ‘mbrtoc32’ function.
It is provided by the Gnulib module ‘c32isupper’.
...
‘uc_is_upper’
This function operates in a locale independent way, on Unicode
characters. It is provided by the Gnulib module
‘unictype/ctype-upper’.
- and we wanted the "locale independent way".
I did not understand why uc_width was said to be "locale dependent":
"These functions are locale dependent."
- from
<https://www.gnu.org/software/libunistring/manual/html_node/uniwidth_002eh.html#index-uc_005fwidth>.
I also don't understand the purpose of the "encoding" argument -- can this
always be "UTF-8"?
I'm also unclear on the exact relationship between the types char32_t,
ucs4_t and uint32_t. For example, uc_width takes a ucs4_t argument
but u8_mbtouc writes to a char32_t variable. In the code I committed,
I used a cast to ucs4_t when calling uc_width.
- Re: Locale-independent paragraph formatting, Gavin Smith, 2023/11/09
- Fwd: Locale-independent paragraph formatting, Gavin Smith, 2023/11/10
- c32width gives incorrect return values in C locale, Gavin Smith, 2023/11/11
- Re: c32width gives incorrect return values in C locale, Bruno Haible, 2023/11/11
- Re: c32width gives incorrect return values in C locale,
Gavin Smith <=
- Re: c32width gives incorrect return values in C locale, Bruno Haible, 2023/11/11
- Re: c32width gives incorrect return values in C locale, Eli Zaretskii, 2023/11/12
- Re: c32width gives incorrect return values in C locale, Gavin Smith, 2023/11/12
- Re: c32width gives incorrect return values in C locale, Patrice Dumas, 2023/11/13
- Re: c32width gives incorrect return values in C locale, Gavin Smith, 2023/11/13
- Re: c32width gives incorrect return values in C locale, Paul Eggert, 2023/11/15
- Re: c32width gives incorrect return values in C locale, Patrice Dumas, 2023/11/15
- Re: c32width gives incorrect return values in C locale, Gavin Smith, 2023/11/18
- Re: c32width gives incorrect return values in C locale, Patrice Dumas, 2023/11/18
- Re: c32width gives incorrect return values in C locale, Patrice Dumas, 2023/11/18