bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gnulib] Re: iconv made easy


From: Simon Josefsson
Subject: [bug-gnulib] Re: iconv made easy
Date: Mon, 13 Dec 2004 22:02:08 +0100
User-agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3.50 (gnu/linux)

Paul Eggert <address@hidden> writes:

> Bruno Haible <address@hidden> writes:
>
>>> I know the function doesn't handle embedded ASCII #0
>>
>> iconv() handles NUL bytes correctly; you don't need to handle them specially.
>
> I think he was aiming for convenience at the expense of generality.

Right.

> But personally I'm not sure it's worth it in this case; the caller can
> simply specify a length of strlen(string)+1.

It was the output length I was worried about, but input lengths are
also a problem.

> Here, the strlen cannot be avoided, but perhaps the length=-1 approach
> is still convenient enough to achieve Simon's goal of convenience.

Would work for me.

>> a) allocate an initial buffer and extend it as needed, stopping and
>>    restarting iconv() each time a realloc is needed,
>> b) call iconv() once to determine the length and then once again for
>>    filling the result string.
...
> But in this case isn't there a 3rd option that is even faster?
> Something like this:
>
>   c) Use MB_LEN_MAX to calculate an upper bound for the size of the
>      output buffer (from the input buffer size).  Allocate a buffer of
>      that size, invoke iconv(), and then realloc the buffer once
>      iconv() finishes and you know the correct size.
>
> This is nearly as simple as (b).  Overall, I'd expect it to be faster
> than either (a) or (b), assuming a decent memory allocator.

I like this.  But it would be quite wasteful on long strings;
MB_LEN_MAX is 16 on glibc hosts.

How about the following, which is a combination of all approaches:

 if inlen < 1024 then
   initial_buffer_length = MB_LEN_MAX * inlen
 else
   initial_buffer_length = [invoke iconv to find out output length]

 allocate an initial buffer (of size initial_buffer_length) and extend
 it as needed, stopping and restarting iconv() each time a realloc is
 needed

 realloc once again, when iconv is finished, to correct the buffer
 size.

This would be fast on short strings (one malloc, one iconv, one
realloc), reasonable fast on long strings (two calls to iconv), and it
would even handle internal bugs when iconv exceed the computed upper
bound.

Thanks,
Simon





reply via email to

[Prev in Thread] Current Thread [Next in Thread]