[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: u8_strconv_to_locale() misbehaves on OSX (Travis CI runner)
From: |
Bruno Haible |
Subject: |
Re: u8_strconv_to_locale() misbehaves on OSX (Travis CI runner) |
Date: |
Thu, 08 Feb 2018 18:05:34 +0100 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-104-generic; KDE/5.18.0; x86_64; ; ) |
Hi Tim,
> locale_charset() returns with "UTF-8".
That is as it should be on Mac OS X.
> u8_strconv_to_locale() and u8_strconv_from_locale() seem not to work as
> expected:
>
>
> One problem seems to be that u8_strconv_to_locale() outputs decomposed
> characters, e.g. u8_strconv_to_locale(bücher.de) returns b"ucher.de.
>
> Hex/u32:
>
> Result: U+0062 U+0022 U+0075 U+0063 U+0068 U+0065 U+0072 U+002e U+0064
> U+0065)
>
> Expected: U+0062 U+00fc U+0063 U+0068 U+0065 U+0072 U+002e U+0064 U+0065
This would indicate that locale_charset() returns "ASCII".
What happens then is that, because u8_strconv_to_locale invokes
u8_strconv_to_encoding, which invokes mem_iconveha with transliterate=true,
which appends '//TRANSLIT' when invoking iconv_open. you get the
transliteration, e.g. from 'ü' to '"u'.
> The second problem is that characters beyond 255 are translated into ?
> (U+003f).
This would indicate that locale_charset() returns "ISO-8859-1". The
question marks then come from the transliteration, again.
> Do you have any hints how to fix these problems ?
I would compile without -O and with -ggdb, then single-step through the code,
paying particular attention to the value of locale_charset() and to
the arguments of iconv_open().
> I would expect u8_strconv_to_locale() to work in a defined manner on
> UTF-8 locales
That's certainly how it is intended to be.
Bruno