bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?


From: Paolo Bonzini
Subject: Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?
Date: Thu, 27 Jun 2013 15:27:40 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6

Il 27/06/2013 14:11, Johannes Meixner ha scritto:
> 
> Hello,
> 
> On Jun 27 10:48 Paolo Bonzini wrote (excerpt):
>> Il 27/06/2013 09:33, Aharon Robbins ha scritto:
>>>
>>> Fortunately, gawk and grep are already there, and I think the sed in
>>> the git repo is as well.  Once Bash turns this on as default, the
>>> world will definitely be a better place, independent of GLIBC.
>>
>> I already explained this multiple times how this is completely
>> delusional.
>>
>> 1) grep, sed, coreutils and so on will only use representation-based
>> range interpretation (I prefer this more neutral term that also explains
>> what's going on) if you use gnulib's regex implementation.  And by
>> default, they use glibc (I just checked grep).
>>
>> 2) Even if you switched the default, you would be at the mercy of
>> distros.  Distros prefer to avoid glibc replacements in single packages,
>> because then all bugs have to be fixed in many different places.  In
>> fact, I checked grep and Fedora builds it with --without-included-regex.
> 
> 
> Right now I checked how grep is built in openSUSE via
> "configure --disable-silent-rules --without-included-regex"

Right thing to do, if you ask me...

> I do not care too much which kind of locale specific ordering
> or collating or regex behaviour is actually implemented
> as long as it works consistently in grep, gawk, sed, bash,...
> 
> I would very much appreciate it if grep, gawk, sed, bash,...
> could agree on one same behaviour and provide clear
> documentation for those who compile it what the
> "commonly accepted upstream behaviour" is so that
> the binaries get built with that same behaviour
> by all distributors who like to be in compliance
> with upstream decisions.

Right now only gawk is different from the others, and not in a very
clean manner:

#ifndef GAWK
              /* Defer to the system regex library about the meaning
                 of range expressions.  */
              regex_t re;
              char pattern[6] = { '[', 0, '-', 0, ']', 0 };
              char subject[2] = { 0, 0 };
              c1 = c;
              if (case_fold)
                {
                  c1 = tolower (c1);
                  c2 = tolower (c2);
                }

              pattern[1] = c1;
              pattern[3] = c2;
              regcomp (&re, pattern, REG_NOSUB);
              for (c = 0; c < NOTCHAR; ++c)
                {
                  if ((case_fold && isupper (c))
                      || (MB_CUR_MAX > 1 && btowc (c) == WEOF))
                    continue;
                  subject[0] = c;
                  if (regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH)
                    setbit_case_fold_c (c, ccl);
                }
              regfree (&re);
#else
              c1 = c;
              if (case_fold)
                {
                  c1 = tolower (c1);
                  c2 = tolower (c2);
                }
              for (c = c1; c <= c2; c++)
                setbit_case_fold_c (c, ccl);
#endif

I would suggest distros to rip out the #else part of this #ifndef.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]