bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dealing with character ranges in grep


From: Aharon Robbins
Subject: Re: Dealing with character ranges in grep
Date: Mon, 13 Jun 2011 22:56:06 +0300
User-agent: Heirloom mailx 12.4 7/29/08

Hi All.

> Date: Thu, 09 Jun 2011 10:14:01 -0700
> From: Paul Eggert <address@hidden>
> To: Paolo Bonzini <address@hidden>
> CC: Aharon Robbins <address@hidden>, bug-grep <address@hidden>,
>         bug-gnulib <address@hidden>, address@hidden
> Subject: Re: Dealing with character ranges in grep
>
> On 06/08/2011 10:14 PM, Aharon Robbins wrote:
>
> > So, for the upcoming gawk 4.0, I decided (as Karl put it) to cut the
> > Gordian knot and make ranges behave like the C locale, the way it's long
> > been documented, and as most people expect.  Those who want the POSIX
> > behavior can still get it using --posix.
>
> This comment and the ensuing thread seems to be assuming old POSIX.
> In new POSIX, that is, in POSIX 1003.1-2008, the standardization committee
> removed the old, bogus requirement of using collating element order.
> The new rule is that the regular expression [a-z] has an unspecified
> behavior outside the C (or POSIX) locale.  So the new gawk behavior
> will conform to POSIX, even without the --posix option.
>
> I suggest that gawk's behavior for [a-z] be the same regardless of whether
> --posix is specified, and that this behavior be what users expect
> (namely, the ASCII character range).  This will be simpler.

This is now done and pushed.  I had to rearrange a chunk of the documentation,
too. :-)

With respect to the other issues raised, I think I will only express
the facts / my opinions as they relate to gawk, and leave everything
else alone.

1. Gawk's default is --with-included-regex.  Gawk's regex is based on
   GLIBC's, but with fixes I've accrued over the years.  Since I want gawk
   to work correctly everywhere, the default is to use the regex routines
   that I supply.

2. With respect to both equivalence classes and collating elements, I have
   to wonder if they are used much in practice.  I do not recall even a
   single email or bug report about the fact that gawk does not support
   either of these.

3. If I understand the conversation, the gist is that RE_RANGES_IGNORE_LOCALES
   is not needed, since the latest standard allows us to just fix the code
   to use Rational Range Interpretation.

   In principle, I'm all for this, but in practice, I'm going to leave gawk's
   code alone for now (there's always 4.1 :-).

   I do think it's worth taking this up with Uli, but that can be pursued
   separately.  In the worst case, RE_RANGES_IGNORE_LOCALES might be an
   acceptable addition if he (or the other maintainers) don't want to move
   off the current way of doing things.

4. If I can help get grep and sed to move to RRI, I'd like to do so. (I have
   preliminary patches for both.) But I'm not going to hold up the gawk release
   for those other programs.

Thanks again to everyone,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]