[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using iconv in stand-alone info

From: Gavin Smith
Subject: Re: Using iconv in stand-alone info
Date: Thu, 24 Dec 2015 19:45:56 +0000

> Here's what I came up with, please see if it looks better now.

It looks okay as far as I can tell without testing it, except for this addition:

>        else
>          {
>            utf8_char_ptr = utf8_char;
>            /* i is width of UTF-8 character */
>            degrade_utf8 (&utf8_char_ptr, &i);
> +         /* If we are done, make sure iconv flushes the last character.  */
> +         if (bytes_left <= 0)
> +           {
> +             utf8_char_ptr = utf8_char;
> +             i = 4;
> +             iconv (iconv_to_utf8, NULL, NULL,
> +                    &utf8_char_ptr, &utf8_char_free);
> +             if (utf8_char_ptr > utf8_char)
> +               {
> +                 utf8_char_ptr = utf8_char;
> +                 degrade_utf8 (&utf8_char_ptr, &i);
> +               }
> +           }
>          }

That's okay for that code path, but I wonder if we should also call
iconv to flush the last character after the main loop exits because of
this condition:

    if (iconv_ret != (size_t) -1)
        /* Success: all of input converted. */

I'm trying to read the libc manual closely and, actually, it's
probably not necessary:

     If all input from the input buffer is successfully converted and
     stored in the output buffer, the function returns the number of
     non-reversible conversions performed.  In all other cases the
     return value is `(size_t) -1' and `errno' is set appropriately.

So if there's one character held back waiting for a following
combining character, there won't be a positive return value indicating

But if that interpretation is correct, then why should the following
be necessary?

+      /* Make sure libiconv flushes out the last converted character.
+        This is required when the conversion is stateful, in which
+        case libiconv might not output the last charcater, waiting to
+        see whether it should be combined with the next one.  */
+      if (iconv_ret != (size_t) -1
+         && text_buffer_iconv (&output_buf, iconv_to_output,
+                               NULL, NULL) != (size_t) -1)

So maybe it is necessary after exiting the main loop, and the wording
in the manual is misleading.

Re this:
there's a ping-pong between 2 separate conversions, and the assumption
seems to be that each conversion advances the input pointer and the
bytes-left variable according to what it produced.

I put the following comment in the code because I wasn't sure about this point:

     /* If file is not in UTF-8, we degrade to ASCII in two steps:
         first convert the character to UTF-8, then look up a replacement
         string.  Note that mixing iconv_to_output and iconv_to_utf8
         on the same input may not work well if the input encoding
         is stateful.  We could deal with this by always converting to
         UTF-8 first; then we could mix conversions on the UTF-8 stream. */

> Having played with this code, I must say that I feel it's based on
> somewhat fragile assumptions whose validity is not clear to me.

It will take me some more time to respond to this. If you find code
that you think is correct and works, by all means please go ahead and
commit it.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]