[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: iconv: document not-fixed bugs, using POSIX terminology
From: |
Noah Misch |
Subject: |
Re: iconv: document not-fixed bugs, using POSIX terminology |
Date: |
Sun, 3 Jan 2021 16:24:36 -0800 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
On Sun, Jan 03, 2021 at 08:48:20PM +0100, Bruno Haible wrote:
> > iconv.m4 clears HAVE_ICONV if it detects iconv() bugs. Docs list those bugs
> > under "Portability problems fixed by Gnulib:". If changing HAVE_ICONV were
> > enough to qualify as a fix, then "This function is missing on some
> > platforms"
> > would also belong under that heading. I'm proposing to move those bugs
> > under
> > "not fixed by Gnulib".
>
> Good point. Yes, it doesn't belong under "fixed by Gnulib". But "not fixed by
> Gnulib" is also not the right category. I'll list these under "handled by
> Gnulib", with an extra explanation.
I like having the separate section. Since you're doing that, "This function
is missing on some platforms:" also belongs under "handled by", not under "not
fixed".
> > Two of those iconv() bugs involve what POSIX calls "non-identical
> > conversion".
> > (GNU libc calls it "non-reversible conversion".) The gnulib docs and code
> > comments use terms "failures" and "conversion errors", but these bugs don't
> > entail the distinct POSIX concept of "error" or failure. Hence, I propose
> > standardizing on the term "non-identical conversion".
>
> "non-identical conversion" is a nonsensical term. Therefore it's better not
> to use it. (Two objects can be identical if they are in the same mathematical
> set. But when we use iconv, we are doing a mapping between the minimal byte
> sequences of the input character set and the minimal byte sequences of the
> destination character set; these are two different sets. If anyone uses
> the term "identical" here, it would mean that the sets have an intersection,
> and that the corresponding byte sequences are the same (e.g. like ISO-8859-1
> and UTF-8 have, as intersection, the set of 1-byte sequences with values >= 0,
> <= 0x7F).)
If https://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html used
"identical" to mean "same byte sequence", it would be requiring an ISO-8859-1 =>
UTF-8 conversion to increment the iconv() return value upon converting 0xA1 to
0xC2 0xA1. Most implementations don't do that. (HP-UX does, and Gnulib calls
it a bug.)
https://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html also
defines what it means to "fail" or experience an "error", and it classifies
"non-identical conversion" as neither of those things. Using a term unattested
in POSIX, like "non-reversible conversion", would solve that problem.
> --- a/modules/iconv-h
> +++ b/modules/iconv-h
> @@ -46,7 +46,9 @@ endif
> MOSTLYCLEANFILES += iconv.h iconv.h-t
>
> Include:
> -<iconv.h>
> +#if HAVE_ICONV_H
> +# include <iconv.h>
> +#endif
Doesn't this module cause the header to exist?