bug-autoconf
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU and BSD sed differences


From: Paul Eggert
Subject: Re: GNU and BSD sed differences
Date: Mon, 12 Dec 2005 10:48:11 -0800
User-agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)

Werner LEMBERG <address@hidden> writes:

> I suggest to add that `\?', `\+', and `\|' should not be used in sed
> expressions

Thanks for suggesting that.  The problem is a bit more general, so I
installed the following:

2005-12-12  Paul Eggert  <address@hidden>

        * doc/autoconf.texi (Limitations of Usual Tools):
        Mention which characters can be escaped with \ in portable regular
        expressions used in grep, sed, expr.  Mention the leading ^ problem
        with expr.  Clean up some confusing wording.  Mention which
        grep options are portable.

--- autoconf.texi       2 Dec 2005 19:19:23 -0000       1.935
+++ autoconf.texi       12 Dec 2005 18:46:51 -0000      1.936
@@ -11891,6 +11891,10 @@ replacement @code{grep -E}.  Also, some 
 not work on long input lines.  To work around these problems, invoke
 @code{AC_PROG_EGREP} and then use @code{$EGREP}.
 
+Portable extended regular expressions should use @samp{\} only to escape
+characters in the string @samp{$()address@hidden|}.  For example, 
@address@hidden
+is not portable, even though it typically matches @address@hidden
+
 The empty alternative is not portable, use @samp{?} instead.  For
 instance with Digital Unix v5.0:
 
@@ -11945,8 +11949,15 @@ Avoid this portability problem by avoidi
 @item @command{expr} (@samp{:})
 @c ----------------------------
 @prindex @command{expr}
-Don't use @samp{\?}, @samp{\+} and @samp{\|} in patterns, as they are
-not supported on Solaris.
+Portable @command{expr} regular expressions should use @samp{\} to
+escape only characters in the string @samp{$()address@hidden@}}.
+For example, alternation, @samp{\|}, is common but Posix does not
+require its support, so it should be avoided in portable scripts.
+Similarly, @samp{\+} and @samp{\?} should be avoided.
+
+Portable @command{expr} regular expressions should not begin with
address@hidden  Patterns are automatically anchored so leading @samp{^} is
+not needed anyway.
 
 The Posix standard is ambiguous as to whether
 @samp{expr 'a' : '\(b\)'} outputs @samp{0} or the empty string.
@@ -12045,6 +12056,12 @@ while @acronym{GNU} @command{find} repor
 @item @command{grep}
 @c -----------------
 @prindex @command{grep}
+Portable scripts can rely on the @command{grep} options @option{-c},
address@hidden, @option{-n}, and @option{-v}, but should avoid other
+options.  For example, don't use @option{-w}, as Posix does not require
+it and Irix 6.5.16m's @command{grep} does not support it.
+
+Some of the options required by Posix are not portable in practice.
 Don't use @samp{grep -q} to suppress output, because many @command{grep}
 implementations (e.g., Solaris) do not support @option{-q}.
 Don't use @samp{grep -s} to suppress output either, because Posix
@@ -12070,12 +12087,17 @@ grep 'foo
 bar' in.txt
 @end example
 
-Alternation, @samp{\|}, is common but Posix does not require its
+Traditional @command{grep} implementations (e.g., Solaris) do not
+support the @option{-E} or @samp{-F} options.  To work around these
+problems, invoke @code{AC_PROG_EGREP} and then use @code{$EGREP}, and
+similarly for @code{AC_PROG_FGREP} and @code{$FGREP}.
+
+Portable @command{grep} regular expressions should use @samp{\} only to
+escape characters in the string @samp{$()address@hidden@}}.  For example,
+alternation, @samp{\|}, is common but Posix does not require its
 support in basic regular expressions, so it should be avoided in
 portable scripts.  Solaris @command{grep} does not support it.
-
-Don't rely on @option{-w}, as Irix 6.5.16m's @command{grep} does not
-support it.
+Similarly, @samp{\+} and @samp{\?} should be avoided.
 
 
 @item @command{join}
@@ -12264,8 +12286,8 @@ Patterns should not include the separato
 of a character class.  In conformance with Posix, the Cray
 @command{sed} will reject @samp{s/[^/]*$//}: use @samp{s,[^/]*$,,}.
 
-Avoid empty patterns within parentheses (i.e., @samp{\(\)}).  Posix is
-silent on whether they are allowed, and Unicos 9 @command{sed} rejects
+Avoid empty patterns within parentheses (i.e., @samp{\(\)}).  Posix does
+not require support for empty patterns, and Unicos 9 @command{sed} rejects
 them.
 
 Unicos 9 @command{sed} loops endlessly on patterns like @samp{.*\n.*}.
@@ -12273,21 +12295,25 @@ Unicos 9 @command{sed} loops endlessly o
 Sed scripts should not use branch labels longer than 8 characters and
 should not contain comments.
 
-Don't include extra @samp{;}, as some @command{sed}, such as address@hidden
-1.4.2's, try to interpret the second as a command:
+Avoid redundant @samp{;}, as some @command{sed} implementations, such as
address@hidden 1.4.2's, incorrectly try to interpret the second
address@hidden;} as a command:
 
 @example
 $ @kbd{echo a | sed 's/x/x/;;s/x/x/'}
 sed: 1: "s/x/x/;;s/x/x/": invalid command code ;
 @end example
 
-Input should have reasonably long lines, since some @command{sed} have
-an input buffer limited to 4000 bytes.
+Input should not have unreasonably long lines, since some @command{sed}
+implementations have an input buffer limited to 4000 bytes.
 
-Alternation, @samp{\|}, is common but Posix does not require its
+Portable @command{sed} regular expressions should use @samp{\} only to escape
+characters in the string @samp{$()address@hidden@}}.  For example,
+alternation, @samp{\|}, is common but Posix does not require its
 support, so it should be avoided in portable scripts.  Solaris
 @command{sed} does not support alternation; e.g., @samp{sed '/a\|b/d'}
 deletes only lines that contain the literal string @samp{a|b}.
+Similarly, @samp{\+} and @samp{\?} should be avoided.
 
 Anchors (@samp{^} and @samp{$}) inside groups are not portable.
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]