bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] Invalid characters when converting from utf8 to i


From: Bruno Haible
Subject: Re: [bug-gnu-libiconv] Invalid characters when converting from utf8 to iso-8859-15
Date: Thu, 18 Mar 2021 00:10:23 +0100
User-agent: KMail/5.1.3 (Linux/4.4.0-203-generic; KDE/5.18.0; x86_64; ; )

Hi,

Tom Sorensen wrote:
> Note -- this isn't just -15, but -1 as well, and possibly others.
> 
> I have a utf8 text file

The official name of the encoding that you mean is UTF-8, not UTF8.

> that contains <c2 98> and <c2 80>. When converted
> to iso-8859-15 via:
> iconv -c -f utf8 -t iso_8859-15//IGNORE input > output

If that worked for you, you must be using iconv from GNU libc, not from
GNU libiconv. The proper bug report address for GNU libc is at
http://www.gnu.org/software/libc/bugs.html
But since GNU libiconv and GNU libc are based on very similar conversion
tables, the answer to your question is the same for both implementations.

> The resulting file contains characters x98 and x80.

This is as expected. All charset converter softwares know that the
characters 0x98 and 0x80 in ISO-8859-1 and ISO-8859-15 are equivalent to
U+0098 and U+0080, respectively. [1][2]

> These are considered
> invalid by some programs that expect iso8859-15 encoding -- including iconv
> itself.

Can you substantiate this claim? What did you do, and what was the outcome?

> Running the file through iconv a second time

Which command line did you use for the second time?

Bruno

[1] https://haible.de/bruno/charsets/conversion-tables/ISO-8859-1.html
[2] https://haible.de/bruno/charsets/conversion-tables/ISO-8859-15.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]