|
From: | Harald van Dijk |
Subject: | Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments |
Date: | Fri, 13 May 2022 09:05:25 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Thunderbird/100.0 |
On 12/05/2022 23:10, Steffen Nurpmeso wrote:
Harald van Dijk wrote in <bd336669-960b-1f5f-fffc-30905d4c8e82@gigawatt.nl>: |On 12/05/2022 18:19, Steffen Nurpmeso via austin-group-l at The Open |Group wrote: |> Bruno Haible wrote in |> <4298913.vrqWZg68TM@omega>: |>|Steffen Nurpmeso wrote: |>|> ... |>|>| [.] "UTF-7"." |>|> |>|> That is overshoot. |>| |>|No. UTF-7 is invalid here because it produces output that is not NUL |>|terminated. See: |>| |>|$ printf 'ab\0' | iconv -t UTF-7 | od -t c |>|0000000 a b + A A A - |>|0000007 |>| |>|strlen() on such a return value makes invalid memory accesses. |>|You can convince yourself by running |>|$ OUTPUT_CHARSET=UTF-7 valgrind ls --help |> |> This is then surely bogus? UTF-7 is a normal single byte |> character set and is to be terminated like anything else. Nothing |> in RFC 2152 nor RFC 3501 if you want makes me think something |> else. | |RFC 2152's rules 1 and 3 only allow specifying the listed characters as |their ASCII form. All other characters, including U+0000, must be |encoded using rule 2. GNU iconv is doing what the RFC specifies here. No really, please. And please do not strip important content,
I didn't think I did. You didn't read the RFC properly, I replied to show where and how the RFC specifies exactly what GNU iconv does, the rest of your message looks like it's based on the false assumption that the RFC specifies something other than what it does, which becomes irrelevant when that assumption is corrected. Looking in more detail, there is one thing I should have responded to. Included here.
UTF-7. Heck, how about that, for example: LC_ALL=C printf 'ab\0' | iconv -f iso-8859-1 -t utf-16 | od -t c 0000000 \0 \0 a \0 b \0 \0 \0 Two leading NULs?
This is not what GNU iconv prints at all, at least not on my system, which just uses the GNU version unmodified. Rather, it prints
0000000 377 376 a \0 b \0 \0 \0 0000010That is, it includes a BOM, just like it showed in your SunOS output. Both the GNU iconv that is shipped as part of GNU libc 2.35, and the GNU iconv that is shipped as part of GNU libiconv 1.16, print this. Those are the current releases. If you are testing an older release, or a modified version, that is important information missing from your message. If you are seeing the leading null bytes in a current version, you may want to report this, including steps on how to get a GNU iconv that behaves this way.
i am neither Chinese nor Russian, and especially not one of the other 7 billion that do not count. (I said surely bogus because i alone see the shiny light of having found give-me-five GNU iconv errors. Or even beyond that.)
This makes absolutely zero sense. I am including it only to pre-empt you again claiming I am stripping important content.
Cheers, Harald van Dijk
[Prev in Thread] | Current Thread | [Next in Thread] |