bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using iconv in stand-alone info


From: Eli Zaretskii
Subject: Re: Using iconv in stand-alone info
Date: Wed, 23 Dec 2015 20:19:13 +0200

> Date: Wed, 23 Dec 2015 18:03:15 +0000
> From: Gavin Smith <address@hidden>
> Cc: Texinfo <address@hidden>
> 
> On 23 December 2015 at 17:38, Eli Zaretskii <address@hidden> wrote:
> > Attached.  OK to commit?
> 
> Thanks. It looks mostly OK; it would be good to have a comment
> explaining why we have to call iconv a second time.

OK, I will add comments before committing.

> There's one part that there could be a problem with, if I understand 
> correctly:
> 
> > @@ -918,7 +920,13 @@ copy_converting (long n)
> >            iconv_ret = iconv (iconv_to_utf8, &inptr, &bytes_left,
> >                               &utf8_char_ptr, &utf8_char_free);
> >            /* If we managed to write a character: */
> > -          if (utf8_char_ptr > utf8_char) break;
> > +          if (utf8_char_ptr > utf8_char)
> > +           {
> > +             if (iconv_ret == (size_t) -1)
> > +               iconv_ret = iconv (iconv_to_utf8, NULL, NULL,
> > +                                  &utf8_char_ptr, &utf8_char_free);
> > +             break;
> > +           }
> >          }
> >
> >        /* errno == E2BIG if iconv ran out of output buffer,
> 
> If it's true that iconv will delay writing to the output buffer until
> it sees the next character in case it is a combining character, then
> it's possible that this condition will never be satisfied.

No, not AFAIK.  'iconv' only returns a value different from -1 when it
finished processing all the input.  By calling it with NULL arguments,
you force it to output what it's got.  Which means that even a
combining character will be output -- as itself.  IOW, if you have
decomposed characters in the buffer, and you process them one by one,
they will remain decomposed, and will not be combined by 'iconv'.  But
that's not a disaster (the display will be correct), and in any case,
you cannot do better, unless we add some additional code which will
look ahead at the following characters, and if they are combining
marks, pass them to 'iconv' together with the preceding base
characters.

> If you had any test files where characters are disappearing, it would
> be interesting if I could see them.

I have shown one case here:

  http://lists.gnu.org/archive/html/bug-wget/2015-12/msg00110.html

Convert the %NN URL-encoding into 8-bit bytes, and convert the result
from CP1255 to UTF-8 -- the last character, ה, will disappear if you
don't call 'iconv' with NULL arguments.

In this message:

  http://lists.gnu.org/archive/html/bug-wget/2015-12/msg00113.html

you will find a small C test program that exhibits the problem and its
solution.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]