[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Dealing with character ranges in grep
From: |
Jim Meyering |
Subject: |
Re: Dealing with character ranges in grep |
Date: |
Thu, 16 Jun 2011 07:58:05 +0200 |
Jim Meyering wrote:
> Bruno Haible wrote:
>> Paolo,
>>
>>> > [=e=] to match "e" as well as accented versions like é, è and ê).
>>> > That is the one feature that you get with glibc, and that you would
>>> > sacrifice when building --with-included-regex.
>>>
>>> I agree. It's up to distros to choose, of course.
>>
>> If you are on the point of sacrificing a glibc feature in many programs,
>> then IMO you should first talk with the glibc people to see what alternative
>> they can offer.
>
> People who build the tools currently have the choice of using
> --with-included-regex or
> --without-included-regex
>
> Note that putting equivalence classes (and backrefs) aside, the
> interpretation of ranges is done in dfa.c, which means the vast
> majority of range uses never even require use of regexp code.
>
> However, backreferences force these tools to skip the DFA-based
> optimization and resort to running the regexp code. In that case,
> there is a dichotomy. Adding a backreference to a range-including
> regexp would have the surprising consequence of changing how that range
> is interpreted when the tool is built to use glibc's regexp code.
>
> Thus, if we go this route, we are effectively saying
> that people who want self-consistent regex-handling
> in our tools must build with --with-included-regex or end
> up causing subtle problems.
>
> That's a big leap.
> I'm not saying I won't take upstream grep over the edge,
> but I'd like to hear what a few distro-maintainers think.
To clarify...
I like Arnold's proposal to make regex range handling sane
and locale-independent.
It goes like this (at least for gawk, grep and sed):
change how dfa.c interprets ranges like [a-z]
change how gnulib's reg* code handles ranges
Always use the included regex code (the one from gnulib),
so that its interpretation is consistent with that of dfa.c.
Grep's current upstream default is to build --with-included-regex,
which makes grep use glibc's regex code.
To make this proposed change go through, that configure-time option would
have to be eliminated, so that we always build with the gnulib-provided
regex code. Of course, if glibc ever changes, we can detect that and
automatically prefer it when possible.
- Re: Dealing with character ranges in grep, (continued)
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/09
- Re: Dealing with character ranges in grep, Paolo Bonzini, 2011/06/09
- Re: Dealing with character ranges in grep, Bruno Haible, 2011/06/09
- Re: Dealing with character ranges in grep, Paolo Bonzini, 2011/06/09
- Re: Dealing with character ranges in grep, Bruno Haible, 2011/06/09
- implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep], Paolo Bonzini, 2011/06/09
- Re: implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep], Bruno Haible, 2011/06/09
- Re: implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep], Paolo Bonzini, 2011/06/09
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/10
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/15
- Re: Dealing with character ranges in grep,
Jim Meyering <=
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/16
- Re: Dealing with character ranges in grep, Philipp Thomas, 2011/06/16
- Re: Dealing with character ranges in grep, Johannes Meixner, 2011/06/17
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/17
- Re: Dealing with character ranges in grep, Paolo Bonzini, 2011/06/27
- proposal: make [A-Z] range handling locale-independent, Jim Meyering, 2011/06/16
- Re: proposal: make [A-Z] range handling locale-independent, Philipp Thomas, 2011/06/16
- Re: proposal: make [A-Z] range handling locale-independent, Jim Meyering, 2011/06/16
- Re: Dealing with character ranges in grep, Johannes Meixner, 2011/06/16
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/16