bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: iconv: document not-fixed bugs, using POSIX terminology


From: Noah Misch
Subject: Re: iconv: document not-fixed bugs, using POSIX terminology
Date: Sun, 3 Jan 2021 16:24:36 -0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Sun, Jan 03, 2021 at 08:48:20PM +0100, Bruno Haible wrote:
> > iconv.m4 clears HAVE_ICONV if it detects iconv() bugs.  Docs list those bugs
> > under "Portability problems fixed by Gnulib:".  If changing HAVE_ICONV were
> > enough to qualify as a fix, then "This function is missing on some 
> > platforms"
> > would also belong under that heading.  I'm proposing to move those bugs 
> > under
> > "not fixed by Gnulib".
> 
> Good point. Yes, it doesn't belong under "fixed by Gnulib". But "not fixed by
> Gnulib" is also not the right category. I'll list these under "handled by
> Gnulib", with an extra explanation.

I like having the separate section.  Since you're doing that, "This function
is missing on some platforms:" also belongs under "handled by", not under "not
fixed".

> > Two of those iconv() bugs involve what POSIX calls "non-identical 
> > conversion".
> > (GNU libc calls it "non-reversible conversion".)  The gnulib docs and code
> > comments use terms "failures" and "conversion errors", but these bugs don't
> > entail the distinct POSIX concept of "error" or failure.  Hence, I propose
> > standardizing on the term "non-identical conversion".
> 
> "non-identical conversion" is a nonsensical term. Therefore it's better not
> to use it. (Two objects can be identical if they are in the same mathematical
> set. But when we use iconv, we are doing a mapping between the minimal byte
> sequences of the input character set and the minimal byte sequences of the
> destination character set; these are two different sets. If anyone uses
> the term "identical" here, it would mean that the sets have an intersection,
> and that the corresponding byte sequences are the same (e.g. like ISO-8859-1
> and UTF-8 have, as intersection, the set of 1-byte sequences with values >= 0,
> <= 0x7F).)

If https://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html used
"identical" to mean "same byte sequence", it would be requiring an ISO-8859-1 =>
UTF-8 conversion to increment the iconv() return value upon converting 0xA1 to
0xC2 0xA1.  Most implementations don't do that.  (HP-UX does, and Gnulib calls
it a bug.)

https://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html also
defines what it means to "fail" or experience an "error", and it classifies
"non-identical conversion" as neither of those things.  Using a term unattested
in POSIX, like "non-reversible conversion", would solve that problem.

> --- a/modules/iconv-h
> +++ b/modules/iconv-h
> @@ -46,7 +46,9 @@ endif
>  MOSTLYCLEANFILES += iconv.h iconv.h-t
>  
>  Include:
> -<iconv.h>
> +#if HAVE_ICONV_H
> +# include <iconv.h>
> +#endif

Doesn't this module cause the header to exist?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]