bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] ensure that the regexp [b-a] is diagnosed as invalid


From: Jim Meyering
Subject: Re: [PATCH] ensure that the regexp [b-a] is diagnosed as invalid
Date: Wed, 03 Feb 2010 19:22:38 +0100

Jim Meyering wrote:
> Eric Blake wrote:
>> Jim Meyering <jim <at> meyering.net> writes:
>>> It adds a test to gl_REGEX that ensures that re_compiler_pattern
>>> diagnoses [b-a] as invalid when using RE_SYNTAX_POSIX_EGREP.
>>
>> Where does POSIX state that this is invalid?
>
> Thanks for looking.
>
> I too verified (before embarking) that POSIX does not declare it invalid,
> merely unspecified. However, since gnulib's regex has rejected such
> ranges for a long time and sed, awk, perl, etc. act that way, I think
> it's the way to go.
>
> Note also that glibc's code appears to try to implement the same
> behavior (though conditional upon RE_NO_EMPTY_RANGES, which nearly
> everyone uses), but somehow that code does not function properly:
>
>       start_collseq = lookup_collation_sequence_value (start_elem);
>       end_collseq = lookup_collation_sequence_value (end_elem);
>       /* Check start/end collation sequence values.  */
>       if (BE (start_collseq == UINT_MAX || end_collseq == UINT_MAX, 0))
>       return REG_ECOLLATE;
>       if (BE ((syntax & RE_NO_EMPTY_RANGES) && start_collseq > end_collseq, 
> 0))
>       return REG_ERANGE;
>
> I've just filed this glibc bug:
>
>     http://sourceware.org/bugzilla/show_bug.cgi?id=11244

Andreas Schwab noticed that

  (RE_SYNTAX_POSIX_EGREP & RE_NO_EMPTY_RANGES) == 0

which explains the problem.

In regcomp.c, there are two build_range_exp functions.
One for _LIBC, and one for non-_LIBC.  The former contains
the range test above.  The latter, which is used by gnulib,
does it this way (regardless of the RE_NO_EMPTY_RANGES syntax bit):

    if (wcscoll (cmp_buf, cmp_buf + 4) > 0)
      return REG_ERANGE;

Since I want grep (and any other tool using gnulib's regex)
to diagnose out-of-order ranges consistently, not just on
x86_64-based and non-glibc systems, I can't leave this as-is.

Here's what I'm planning:

  - Revert this patch:
      ensure that the regexp [b-a] is diagnosed as invalid

  - Modify the !_LIBC build_range_exp function to take a new argument,
    syntax, and use that to guard the wcscoll test above.

  - Ensure that grep uses the RE_NO_EMPTY_RANGES syntax bit as needed.

-----
Bottom line: with the above, we'll continue to use glibc's regex on non-x86_64.

Jim


P.S. I was a little dismayed to see that csplit, expr and nl
all use these regex syntax flags:

    RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES

and thus do not diagnose empty ranges.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]