[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Please remove iconv_open (charset, "ASCII"); from unicode.c

From: John Kearney
Subject: Please remove iconv_open (charset, "ASCII"); from unicode.c
Date: Wed, 07 Mar 2012 05:47:22 +0100
User-agent: Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20120129 Thunderbird/10.0

Hi chet can you please remove the following from the unicode.c file

localconv = iconv_open (charset, "ASCII");

This is invalid fall back. zhis creates a translation config. The
primary attempt is utf-8 to destination codeset. If that conversion
fails this tries selecting ASCII to codeset. !!!!! But the code still
inputs utf-8 as input to the icconv. this means that this is less
likely to successfully encode than a simple assignment. consider
U+80 becomes utf-8 "\xc2\x80" which because we tell iconv this is
ascii becomes ascii "\xc2\x80".

do this line takes a U+80 and turns it into a U+c3 and a U+80.

The way i rewrote the icconv code made it cleaner, safer and quicker,
please consider using it. I avoided the need for the strcpy among
other things.

On 02/21/2012 03:42 AM, Chet Ramey wrote:
> On 2/18/12 5:39 AM, John Kearney wrote:
>> Bash Version: 4.2 Patch Level: 10 Release Status: release
>> Description: Current u32toutf8 only encode values below 0xffff
>> correctly. wchar_t can be ambiguous size better in my opinion to
>> use unsigned long, or uint32_t, or something clearer.
> Thanks for the patch.  It's good to have a complete
> implementation, though as a practical matter you won't see UTF-8
> characters longer than four bytes.  I agree with you about the
> unsigned 32-bit int type; wchar_t is signed, even if it's 32 bits,
> on several systems I use.
> Chet

reply via email to

[Prev in Thread] Current Thread [Next in Thread]