bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: locale specific ordering in EN_US vs. characterset collation rules f


From: Linda Walsh
Subject: Re: locale specific ordering in EN_US vs. characterset collation rules for UTF-8
Date: Fri, 28 Jun 2013 10:04:01 -0700
User-agent: Thunderbird



Paolo Bonzini wrote:
Il 28/06/2013 07:04, Linda Walsh ha scritto:
>
> Chet Ramey wrote:
>> The world is larger than glibc and the glibc locale definitions.  We need
>> a solution that encompasses all of it.  That solution should, and maybe
>> will, include glibc, but that is not sufficient by itself.
> ----
>     I don't suppose it is possible to use the Unicode
> collation order when using unicode?

When matching regular expressions, people usually want to treat case
specially; for example [A-E] should exclude lowercase a/b/c/d/e.
Unfortunately, this is not the case when collating other things.  The
Unicode collation standard in fact says ("1.1 Multi-Level Comparison"):

   Case differences (uppercase versus lowercase), are typically
   ignored, if the base letters or their accents differ
That's not the unicode algorithm I referred to.

A-Z < a-z in the one I referred to.

The ordering looks very similar to 'C' with extensions to cover
accents... but the case for latin letterin in particular was ordered just as in
the 'C' local.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]