bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MirBSD mbtowc bug? failure on test-wcrtomb


From: Bruno Haible
Subject: Re: MirBSD mbtowc bug? failure on test-wcrtomb
Date: Sat, 23 Oct 2010 14:06:09 +0200
User-agent: KMail/1.9.9

Eric Blake wrote:
> $ ./foo
> setlocale succeeded -> en_US.UTF-8
> nl_langinfo (CODESET) = |UTF-8|
> MB_CUR_MAX = 5
> strftime -> 8
>   46 65 62 72 75 61 72 79

That was certainly unexpected. We ask for a Japanese locale, and get back
a locale with second month name "February". Eek. I'm applying the workaround
below.

Thorsten Glaser wrote:
> Any call to setlocale() in MirBSD is a nop anyway¹.

Is that true? Do you mean, the programs
  ================================================================
  #include <locale.h>
  #include <stdio.h>
  int main ()
  {
    printf ("%s\n", setlocale (LC_MESSAGES, NULL));
    return 0;
  }
  ================================================================
and
  ================================================================
  #include <locale.h>
  #include <stdio.h>
  int main ()
  {
    setlocale (LC_ALL, "C");
    printf ("%s\n", setlocale (LC_MESSAGES, NULL));
    return 0;
  }
  ================================================================
print en_US.UTF-8 and not C or POSIX?

In that case, programs cannot even distinguish the C locale from other locales!
Fortunately GNU gettext already has a workaround against this.

> from what I gathered, back then, other implementations also fall back,
> although, admittedly, to the "C" locale.

Only OpenBSD and possibly Cygwin do. All other systems leave the locale
unchanged when setlocale is called with an unsupported locale identifier.
This is precisely what causes the trouble: MirBSD violates POSIX => It causes
porting trouble to the application writers.

> > "If the string does not correspond to a valid locale, setlocale() shall 
> > return
> > a null pointer and the international environment is not changed. Otherwise,
> > setlocale() shall return the name of the locale just set."
> >
> > Returning a completely different string
> 
> That could be argumented away with canonicalisation ;)
> ...
> I think always returning success does, in a twisted sense, make
> sense for our environment

It could be understandable to "canonicalize" en_GB to en_US. But canonicalizing
ja_JP to en_US is far-fetched.

Do you also "canonicalize" en_US.ISO8859-1 to en_US.UTF-8? This would be
an even bigger bug, because the UTF-8 encoding is not the same nor an extension
of the requested ISO-8859-1 encoding.

Bruno


2010-10-23  Bruno Haible  <address@hidden>

        Tests: Fix LOCALE_JA on MirBSD 10.
        * m4/locale-ja.m4 (gt_LOCALE_JA): Reject a locale identifier that leads
        to an UTF-8 locale.
        * m4/locale-fr.m4 (gt_LOCALE_FR): Likewise.
        * m4/locale-zh.m4 (gt_LOCALE_ZH_CN): Likewise.
        Reported by Eric Blake.

--- m4/locale-fr.m4.orig        Sat Oct 23 13:27:25 2010
+++ m4/locale-fr.m4     Sat Oct 23 13:25:07 2010
@@ -1,4 +1,4 @@
-# locale-fr.m4 serial 11
+# locale-fr.m4 serial 12
 dnl Copyright (C) 2003, 2005-2010 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -31,11 +31,14 @@
      is empty, and the behaviour of Tcl 8.4 in this locale is not useful.
      On OpenBSD 4.0, when an unsupported locale is specified, setlocale()
      succeeds but then nl_langinfo(CODESET) is "646". In this situation,
-     some unit tests fail.  */
+     some unit tests fail.
+     On MirBSD 10, when an unsupported locale is specified, setlocale()
+     succeeds but then nl_langinfo(CODESET) is "UTF-8".  */
 #if HAVE_LANGINFO_CODESET
   {
     const char *cs = nl_langinfo (CODESET);
-    if (cs[0] == '\0' || strcmp (cs, "ASCII") == 0 || strcmp (cs, "646") == 0)
+    if (cs[0] == '\0' || strcmp (cs, "ASCII") == 0 || strcmp (cs, "646") == 0
+        || strcmp (cs, "UTF-8") == 0)
       return 1;
   }
 #endif
--- m4/locale-ja.m4.orig        Sat Oct 23 13:27:25 2010
+++ m4/locale-ja.m4     Sat Oct 23 13:26:36 2010
@@ -1,4 +1,4 @@
-# locale-ja.m4 serial 7
+# locale-ja.m4 serial 8
 dnl Copyright (C) 2003, 2005-2010 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -33,11 +33,14 @@
      is empty, and the behaviour of Tcl 8.4 in this locale is not useful.
      On OpenBSD 4.0, when an unsupported locale is specified, setlocale()
      succeeds but then nl_langinfo(CODESET) is "646". In this situation,
-     some unit tests fail.  */
+     some unit tests fail.
+     On MirBSD 10, when an unsupported locale is specified, setlocale()
+     succeeds but then nl_langinfo(CODESET) is "UTF-8".  */
 #if HAVE_LANGINFO_CODESET
   {
     const char *cs = nl_langinfo (CODESET);
-    if (cs[0] == '\0' || strcmp (cs, "ASCII") == 0 || strcmp (cs, "646") == 0)
+    if (cs[0] == '\0' || strcmp (cs, "ASCII") == 0 || strcmp (cs, "646") == 0
+        || strcmp (cs, "UTF-8") == 0)
       return 1;
   }
 #endif
@@ -52,7 +55,7 @@
   if (MB_CUR_MAX == 1)
     return 1;
   /* Check whether in a month name, no byte in the range 0x80..0x9F occurs.
-     This excludes the UTF-8 encoding.  */
+     This excludes the UTF-8 encoding (except on MirBSD).  */
   t.tm_year = 1975 - 1900; t.tm_mon = 2 - 1; t.tm_mday = 4;
   if (strftime (buf, sizeof (buf), "%B", &t) < 2) return 1;
   for (p = buf; *p != '\0'; p++)
--- m4/locale-zh.m4.orig        Sat Oct 23 13:27:25 2010
+++ m4/locale-zh.m4     Sat Oct 23 13:26:48 2010
@@ -1,4 +1,4 @@
-# locale-zh.m4 serial 6
+# locale-zh.m4 serial 7
 dnl Copyright (C) 2003, 2005-2010 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -34,11 +34,14 @@
      is empty, and the behaviour of Tcl 8.4 in this locale is not useful.
      On OpenBSD 4.0, when an unsupported locale is specified, setlocale()
      succeeds but then nl_langinfo(CODESET) is "646". In this situation,
-     some unit tests fail.  */
+     some unit tests fail.
+     On MirBSD 10, when an unsupported locale is specified, setlocale()
+     succeeds but then nl_langinfo(CODESET) is "UTF-8".  */
 #if HAVE_LANGINFO_CODESET
   {
     const char *cs = nl_langinfo (CODESET);
-    if (cs[0] == '\0' || strcmp (cs, "ASCII") == 0 || strcmp (cs, "646") == 0)
+    if (cs[0] == '\0' || strcmp (cs, "ASCII") == 0 || strcmp (cs, "646") == 0
+        || strcmp (cs, "UTF-8") == 0)
       return 1;
   }
 #endif
@@ -49,7 +52,7 @@
   if (strchr (getenv ("LC_ALL"), '.') == NULL) return 1;
 #endif
   /* Check whether in a month name, no byte in the range 0x80..0x9F occurs.
-     This excludes the UTF-8 encoding.  */
+     This excludes the UTF-8 encoding (except on MirBSD).  */
   t.tm_year = 1975 - 1900; t.tm_mon = 2 - 1; t.tm_mday = 4;
   if (strftime (buf, sizeof (buf), "%B", &t) < 2) return 1;
   for (p = buf; *p != '\0'; p++)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]