[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 10/11] quote consistently and make tests pass with new quotin
From: |
Akim Demaille |
Subject: |
Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib |
Date: |
Wed, 25 Jan 2012 14:04:55 +0100 |
Hi Paul,
I'm sending this message to you as the main author of
the quotearg module. I am not sure which component should
be considered guilty here, but the problem is:
- independently of any LC_*, localcharset.c returns UTF-8
on OS X.
- If I instrument localcharset.c, I can see that the OS
returns "US-ASCII" as locale_codeset.
- localcharset's get_charset_aliases then maps US-ASCII
to UTF-8 (this is where it looks wrong to me, but...).
See the excerpt below. FWIW, I have also attached the
charset.alias file.
- so quotearg decides to use nice UTF-8 quotes (since
quote.c asks for locale-dependent quotes). See below
gettext_quote
- so the test suite fails since it expects plain old "'".
What module would be considered faulty here? I can provide
a patch, but I would first like to know for which part :)
Thanks!
Akim
Le 23 janv. 2012 à 16:06, Akim Demaille a écrit :
>
> Le 23 janv. 2012 à 15:34, Jim Meyering a écrit :
>
>>> I had never realized that the tests are not specifying LC_ALL=C
>>> and they should. But even when I do, I still have nice quotes.
>>
>> Hi Akim,
>>
>> Maybe you need to set LANG to empty or to C?
>> glibc honors LANG (erroneously, imho)
>
> My tests were on OS X. LANG=C, or unset, does not
> change anything.
>
> Some digging led me into this:
>
>> # if defined DARWIN7
>> /* To avoid the trouble of installing a file that is shared by many
>> GNU packages -- many packaging systems have problems with this --,
>> simply inline the aliases here. */
>> cp = "ISO8859-1" "\0" "ISO-8859-1" "\0"
>> "ISO8859-2" "\0" "ISO-8859-2" "\0"
>> "ISO8859-4" "\0" "ISO-8859-4" "\0"
>> "ISO8859-5" "\0" "ISO-8859-5" "\0"
>> "ISO8859-7" "\0" "ISO-8859-7" "\0"
>> "ISO8859-9" "\0" "ISO-8859-9" "\0"
>> "ISO8859-13" "\0" "ISO-8859-13" "\0"
>> "ISO8859-15" "\0" "ISO-8859-15" "\0"
>> "KOI8-R" "\0" "KOI8-R" "\0"
>> "KOI8-U" "\0" "KOI8-U" "\0"
>> "CP866" "\0" "CP866" "\0"
>> "CP949" "\0" "CP949" "\0"
>> "CP1131" "\0" "CP1131" "\0"
>> "CP1251" "\0" "CP1251" "\0"
>> "eucCN" "\0" "GB2312" "\0"
>> "GB2312" "\0" "GB2312" "\0"
>> "eucJP" "\0" "EUC-JP" "\0"
>> "eucKR" "\0" "EUC-KR" "\0"
>> "Big5" "\0" "BIG5" "\0"
>> "Big5HKSCS" "\0" "BIG5-HKSCS" "\0"
>> "GBK" "\0" "GBK" "\0"
>> "GB18030" "\0" "GB18030" "\0"
>> "SJIS" "\0" "SHIFT_JIS" "\0"
>> "ARMSCII-8" "\0" "ARMSCII-8" "\0"
>> "PT154" "\0" "PT154" "\0"
>> /*"ISCII-DEV" "\0" "?" "\0"*/
>> "*" "\0" "UTF-8" "\0";
>> # endif
>
> which, IIUC, maps my "US-ASCII" (which is the
> answer on my system for locale_codeset in locale_charset)
> to UTF-8. And then, it seems to be hard-coded to use UTF-8
> quotes in quoteargs.
>
>> /* MSGID approximates a quotation mark. Return its translation if it
>> has one; otherwise, return either it or "\"", depending on S.
>>
>> S is either clocale_quoting_style or locale_quoting_style. */
>> static char const *
>> gettext_quote (char const *msgid, enum quoting_style s)
>> {
>> char const *translation = _(msgid);
>> char const *locale_code;
>>
>> if (translation != msgid)
>> return translation;
>>
>> /* For UTF-8 and GB-18030, use single quotes U+2018 and U+2019.
>> Here is a list of other locales that include U+2018 and U+2019:
>>
>> ISO-8859-7 0xA1 KOI8-T 0x91
>> CP869 0x8B CP874 0x91
>> CP932 0x81 0x65 CP936 0xA1 0xAE
>> CP949 0xA1 0xAE CP950 0xA1 0xA5
>> CP1250 0x91 CP1251 0x91
>> CP1252 0x91 CP1253 0x91
>> CP1254 0x91 CP1255 0x91
>> CP1256 0x91 CP1257 0x91
>> EUC-JP 0xA1 0xC6 EUC-KR 0xA1 0xAE
>> EUC-TW 0xA1 0xE4 BIG5 0xA1 0xA5
>> BIG5-HKSCS 0xA1 0xA5 EUC-CN 0xA1 0xAE
>> GBK 0xA1 0xAE Georgian-PS 0x91
>> PT154 0x91
>>
>> None of these is still in wide use; using iconv is overkill. */
>> locale_code = locale_charset ();
>> fprintf (stderr, "charset: %s\n", locale_code);
>
> I get "charset: UTF-8".
>
>> if (STRCASEEQ (locale_code, "UTF-8", 'U','T','F','-','8',0,0,0,0))
>> return msgid[0] == '`' ? "\xe2\x80\x98": "\xe2\x80\x99";
>> if (STRCASEEQ (locale_code, "GB18030", 'G','B','1','8','0','3','0',0,0))
>> return msgid[0] == '`' ? "\xa1\ae": "\xa1\xaf";
>>
>> return (s == clocale_quoting_style ? "\"" : "'");
>> }
>
>
> My understanding is that there is nothing prepared for me to override
> this, since bison is using:
>
>> /* Return an unambiguous printable representation of NAME,
>> allocated in slot N, suitable for diagnostics. */
>> char const *
>> quote_n (int n, char const *name)
>> {
>> return quotearg_n_style (n, locale_quoting_style, name);
>> }
>
> I could add some dependency on LC_ALL here, but it looks wrong.
> It feels wrong that even with LC_CTYPE=C, I get UTF-8.
charset.alias.txt
Description: Text document
- [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, (continued)
- [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Jim Meyering, 2012/01/18
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/18
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/23
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Jim Meyering, 2012/01/23
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/23
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Jim Meyering, 2012/01/23
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/23
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Jim Meyering, 2012/01/23
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/23
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Jim Meyering, 2012/01/23
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib,
Akim Demaille <=
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Paul Eggert, 2012/01/25
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/26
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/26
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Paul Eggert, 2012/01/26
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/27
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Paul Eggert, 2012/01/27
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/28
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/28
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Paul Eggert, 2012/01/28
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/29