[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: why are \d and \D not implemented but don't throw errors in regex?

From: Peter Cordes
Subject: Re: why are \d and \D not implemented but don't throw errors in regex?
Date: Sat, 07 Dec 2013 19:33:40 -0400
User-agent: Mutt/1.5.18 (2008-05-17)

On Sat, Dec 07, 2013 at 11:06:22AM -0600, Craig Steffen wrote:
> Hi,
> I'm working on some bash scripts for work where I'm using a regular
> expression to grab a number from the output of another command.
> I've gotten fairly adept at using regular expressions, in perl mostly,
> but I just couldn't get it to work in bash.
> One reason was that the regex search is supposed to be a variable
> rather than an literal inside the [[ ]] expression.
> However, the second reason was that \d and \D are apparently not
> implemented, even though \s and \S are?  And furthermore, the match
> just silently fails without indicating anything is amiss.  After
> searching, [[:digit:]] does work instead of \d.

 That's the behaviour of the regex library used by most things other
than perl (which has its own regex engine).  e.g. search a man page
with less(1), \s matches whitespace, \d matches the letter d.
[[:digit:]] matches digits.

 I agree your complaint seems valid, but it's the behaviour of the
regex engine built into GNU libc (in this case).  Bash on other
platforms would use the regex engine in their system libc.  (Unless
I'm mistaken in my assumption that bash doesn't have its own regex

 It's really unfortunate that there are so many
not-universally-supported extensions to the regex language.  And as
you discovered, especially unfortunate that implementations that don't
support them just treat them as \-quoted literals, rather than
unsupported syntax.  There are probably things that depend on using
\something even when "something" isn't a special character.  However,
POSIX says 

   The interpretation of an ordinary character preceded by a
   backslash ( '\' ) is undefined.

 So anything that broke with a regex library that didn't just treat
\something as literal something would be the fault of whatever was
depending on that behaviour.  So it would probably actually be good if
the default behaviour of glibc was to report a regex compilation error
in that case, or maybe even better, print a warning like "\d: unknown
special character, treating as literal".

 Of course, POSIX doesn't specify either \s or \d, just the
[:space:] and [:digit] and other character classes that can be used
within [].

#define X(x,y) x##y
Peter Cordes ;  e-mail: X(address@hidden , des.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC

reply via email to

[Prev in Thread] Current Thread [Next in Thread]