grep-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Possible regression in grep 3.10


From: Jim Meyering
Subject: Re: Possible regression in grep 3.10
Date: Thu, 30 Mar 2023 09:18:50 -0700

On Thu, Mar 30, 2023 at 6:18 AM Felix Yan <felixonmars@archlinux.org> wrote:
> Hello,
>
> I noticed a possible regression in the update of grep 3.9 -> 3.10:
>
> "\d" inside a square bracket [] no longer matches against ASCII digits.
>
> 3.9:
> $ echo 123 | grep -P '[\d]'
> 123
>
> 3.10:
> $ echo 123 | grep -P '[\d]'

Thanks for the report.
That is indeed a regression.
To avoid it using 3.10, you would have to build with a new-enough PCRE2, so that
the inadequate workaround code in pcresearch.c is #ifdef'd out
(PCRE2_EXTRA_ASCII_BSD).

Thinking about how to fix this for those who don't build grep with
bleeding-edge PCRE2, at first it looked easy: when rewriting \d to
[0-9], just know whether we're currently in a bracket group (if so,
emit 0-9, not [0-9]). But what about \D -> [^0-9]? For that to work
(emitting only ^0-9), the \D would have to be the first two bytes
after an opening "[". You might say you can "hoist" the \D to the
front. But what if the group is complemented, i.e., it starts with a
"^"? In that case, one could transform the trivial "[^\D]" to "[0-9]",
but the expansion would be much longer for e.g., "[^\Da]".

So far, I think the solution must be to handle \D differently: e.g.,
convert "[a\Db]" to "(?:[^0-9]|[ab])
and "[^a\Db]" to "(?:[0-9]|[^ab]). This is getting quite
ugly/complicated for a workaround for a feature like \D that is used
very rarely. Wondering who would be affected (and notice) if we were
to leave \D untouched by this workaround code.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]