[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8
From: |
Max Horn |
Subject: |
Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8 |
Date: |
Mon, 11 Jun 2012 00:31:04 +0200 |
Hi again,
Am 07.06.2012 um 14:07 schrieb Bruno Haible:
[...]
>
>> But this is dangerous, because now UTF-8 is set but MB_CUR_MAX is 1
>> and various parts of sed interpret "Rémi Leblond" as an invalid
>> character sequence for a UTF-8 character set.
>
> Indeed, I can see how this inconsistency leads to bugs like the described
> ones.
>
> The fix could be to have two different locale_charset() functions,
> one that returns "US-ASCII" and another one that returns "UTF-8".
> The first one to be used when MB_CUR_MAX and mbrtowc() are used as
> well, the second one to be used by gettext(). But the separation
> line between the two cases is not yet clear to me. Any insights?
Hum, that sounds quite complicated -- could you explain what this would gain
over the idea of simply mapping "US-ASCII" to "ASCII", or over the patch Paul
suggested:
> --- a/lib/localcharset.c
> +++ b/lib/localcharset.c
> @@ -542,5 +542,12 @@ locale_charset (void)
> if (codeset[0] == '\0')
> codeset = "ASCII";
>
> +#ifdef DARWIN7
> + /* MacOS X sets MB_CUR_MAX to 1 when LC_ALL=C, and "UTF-8"
> + (the default codeset) does not work when MB_CUR_MAX is 1. */
> + if (strcmp (codeset, "UTF-8") == 0 && MB_CUR_MAX <= 1)
> + codeset = "ASCII";
> +#endif
> +
> return codeset;
> }
Cheers,
Max
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, (continued)
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Bruno Haible, 2012/06/07
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Eric Blake, 2012/06/07
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Paolo Bonzini, 2012/06/07
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Pádraig Brady, 2012/06/07
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Eric Blake, 2012/06/07
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Paolo Bonzini, 2012/06/07
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Eric Blake, 2012/06/07
Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8,
Max Horn <=
Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Max Horn, 2012/06/06