[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Accepting [xyz---abc] - three minus signs to mean one

From: Arnold Robbins
Subject: Accepting [xyz---abc] - three minus signs to mean one
Date: Thu, 21 Apr 2022 10:57:45 +0300
User-agent: Heirloom mailx 12.5 6/20/10


Way back in May of 2015, Nelson Beebe submitted the following
bug report for gawk:

> Date: Mon, 25 May 2015 14:21:04 -0600 (MDT)
> From: "Nelson H. F. Beebe" <beebe@math.utah.edu>
> To: "Arnold Robbins" <arnold@skeeve.com>
> Cc: beebe@math.utah.edu
> Subject: gawk-4.1.3 regexp error
> I just ran an old (1996--date) awk program with gawk-4.1.3 and got an
> error that can be exhibited like this:
>       % gawk '/[^0-9---]/ {print}'
>       gawk: cmd. line:1: error: tent of \{\}: /[^0-9---]/
> As far as I can see, that is a perfectly valid range expression, and
> using three hyphens to represent one hyphen is the traditional way
> to incorporate a hyphen in the expression.

The upshot was that regex didn't support this, and I didn't (at the
time) want to tackle trying to fix it.  (I did fix the error message,
at least.)

I submitted a bug report about it. At the time, Paul Eggert said the following:

> Date: Mon, 25 May 2015 23:53:31 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> To: arnold@skeeve.com, 20657@debbugs.gnu.org
> Subject: Re: bug#20657: Traditional range expression not accepted in regex/dfa
> arnold@skeeve.com wrote:
> > The bugaboo here is the "---"; it's
> > a range expression consisting of minus through minus, and apparently long
> > ago was how one got a minus into a bracket expression.
> Actually, long ago expressions like '[^0-9-]' worked just as they do now,
> and it wasn't ever necessary to use trailing "---".  That being said,
> it is true that in 7th Edition Unix '[^0-9---]' meant the same thing as
> '[^0-9-]', so in that sense we have an incompatibility with 7th Edition
> Unix here.
> >     $ ./src/grep '[^0-9---]' /dev/null
> >     ./src/grep: Invalid range end
> >
> > The underlying regex and, I believe, dfa routines don't accept this.
> Yes, that's correct.  It's not a bug, though, as the regexp is ambiguous
> and does not conform to POSIX, which says the following about RE
> bracket expressions: "To use a <hyphen> as the starting range point,
> it shall either come first in the bracket expression or be specified
> as a collating symbol; for example, "[][.-.]-0]", which matches either
> a <right-square-bracket> or any character or collating element that
> collates between <hyphen> and 0, inclusive."
> <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05>
> In your correspondent's example, the hyphen is a starting range point
> but is neither first in the bracket expression nor is specified as a
> collating symbol, so the regexp doesn't conform to POSIX.
> Even though it's not a bug I suppose it wouldn't hurt to make the GNU
> matchers compatible with 7th Edition Unix here, if someone really wants
> to take that task on; it's not urgent, though.

I had some time yesterday, and feeling brave and a little stronger in
The Force than usual, I came up the with the attached patch. It doesn't
break any of my tests.

As far as my testing indicates, dfa.c doesn't need a patch, it seems
to accept "---" inside brackets for a single minus.

If there are no objections, can we get this into Gnulib?



Attachment: 3minus.diff
Description: Text Data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]