bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MirBSD mbtowc bug? failure on test-wcrtomb


From: Thorsten Glaser
Subject: Re: MirBSD mbtowc bug? failure on test-wcrtomb
Date: Fri, 22 Oct 2010 18:27:46 +0000 (UTC)

Eric Blake dixit:

> "If the string does not correspond to a valid locale, setlocale() shall return
> a null pointer and the international environment is not changed. Otherwise,
> setlocale() shall return the name of the locale just set."
>
> Returning a completely different string

That could be argumented away with canonicalisation ;)

> (en_US.UTF-8), and in particular
> falling back on a locale that is not the C locale, is just crazy.

But that *is* the C locale, on MirBSD… since all locales are
equivalent, this one is, too.


I’m a bit torn here. Do I want to list every possible permutation
of a valid locale here? Any call to setlocale() in MirBSD is a nop
anyway¹. What would be “valid”? en_US.utf8? en_GB.UTF-8? *.UTF-8?

While we have the POSIX locale functions – well some of them – it
was only intended to get UTF-8 support in many applications working.
It’s even documented (somewhere…) that applications that want, for
example, use LC_MESSAGES must overwrite setlocale() and the catgets
family of functions.

I think always returning success does, in a twisted sense, make
sense for our environment… although, when conceived, I couldn’t
guess that people would want to set things like "ja_JP", espe-
cially WITHOUT (gasp!) UTF-8.

Even in “the” locale, if you feed something that’s not valid
UTF-8 to the mb[s]rtowc[s] family of functions, you get some
valid wide character data – and this, see ¹, is by design.
http://thread.gmane.org/gmane.os.miros.general/7938/focus=8088
was when this was discussed and we eventually got a range from
CSUR; http://thread.gmane.org/gmane.os.miros.general/8899 (in
German, sorry) has some comparision with a later Python exten-
sion doing basically the same.


① Actually, it can be skipped entirely, since after the switch to
  the OPTU encoding scheme, no separate 7bit (8bit clean) locale
  is needed any more and we truly have only one. We never checked
  anything other than LC_CTYPE anyway and will probably not support
  anything other in the base system (which doesn’t exclude a pos-
  sible libl10n.so or somesuch, though).

bye,
//mirabilos
-- 
I believe no one can invent an algorithm. One just happens to hit upon it
when God enlightens him. Or only God invents algorithms, we merely copy them.
If you don't believe in God, just consider God as Nature if you won't deny
existence.              -- Coywolf Qi Hunt



reply via email to

[Prev in Thread] Current Thread [Next in Thread]