bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Changed behavior in sed 4.6


From: Jim Meyering
Subject: Re: Changed behavior in sed 4.6
Date: Thu, 20 Dec 2018 19:27:21 -0800

On Thu, Dec 20, 2018 at 2:49 PM Jan Palus <address@hidden> wrote:
> I've just happened to notice a difference in behavior between sed 4.5 and 4.6
> when building VirtualBox. It seems to be locale dependent:
>
> $ echo 'foo(bar '|LC_ALL=C sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> foo(bar
>
> $ echo 'foo(bar '|LC_ALL=C.UTF-8 sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> foo(foo *
>
> In 4.5 both results are the same -- same as the second output with
> LC_ALL=C.UTF-8.

Thanks a lot for that report.
This is indeed a regression. It also affects the just-release
grep-3.2, since the source is in a file used by both: gnulib's dfa.c.
I tracked it down to this gnulib/lib/dfa.c commit: v0.1-2213-gae4b73e28
To back that out, I must first revert part of this fix-up patch:
v0.1-2281-g95cd86dd7

Here's a demonstrator with grep: (it should match, but with 3.2, does not):

$ echo 123-x|LC_ALL=C grep '.\bx'
$

To avoid the failure, one can:
- specify -P (for PCRE, a different matcher), or
- don't use the C locale, but rather use a multi-byte locale like the
one you chose, which inhibits use of the DFA matcher, because \b's
definition requires multi-byte aware machinery not present in the DFA
matcher.

I expect to revert the mentioned mentioned gnulib commits, and then to
make new releases of both grep and sed.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]