[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Changes to grep/manual/html_node/Problematic-Expressions.html,v
From: |
Jim Meyering |
Subject: |
Changes to grep/manual/html_node/Problematic-Expressions.html,v |
Date: |
Wed, 22 Mar 2023 22:55:26 -0400 (EDT) |
CVSROOT: /webcvs/grep
Module name: grep
Changes by: Jim Meyering <meyering> 23/03/22 22:55:22
Index: html_node/Problematic-Expressions.html
===================================================================
RCS file: /webcvs/grep/grep/manual/html_node/Problematic-Expressions.html,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -b -r1.1 -r1.2
--- html_node/Problematic-Expressions.html 3 Sep 2022 19:33:14 -0000
1.1
+++ html_node/Problematic-Expressions.html 23 Mar 2023 02:55:21 -0000
1.2
@@ -1,11 +1,11 @@
-<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
+<!DOCTYPE html>
<html>
-<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
+<!-- Created by GNU Texinfo 7.0dev, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This manual is for grep, a pattern matching engine.
-Copyright (C) 1999-2002, 2005, 2008-2022 Free Software Foundation,
+Copyright © 1999-2002, 2005, 2008-2023 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
@@ -14,10 +14,10 @@
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the section entitled
"GNU Free Documentation License". -->
-<title>Problematic Expressions (GNU Grep 3.8)</title>
+<title>Problematic Expressions (GNU Grep 3.10)</title>
-<meta name="description" content="Problematic Expressions (GNU Grep 3.8)">
-<meta name="keywords" content="Problematic Expressions (GNU Grep 3.8)">
+<meta name="description" content="Problematic Expressions (GNU Grep 3.10)">
+<meta name="keywords" content="Problematic Expressions (GNU Grep 3.10)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
@@ -31,21 +31,9 @@
<link href="Basic-vs-Extended.html" rel="prev" title="Basic vs Extended">
<style type="text/css">
<!--
-a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
-a.summary-letter {text-decoration: none}
-blockquote.indentedblock {margin-right: 0em}
-div.display {margin-left: 3.2em}
-div.example {margin-left: 3.2em}
-kbd {font-style: oblique}
-pre.display {font-family: inherit}
-pre.format {font-family: inherit}
-pre.menu-comment {font-family: serif}
-pre.menu-preformatted {font-family: serif}
-span.nolinebreak {white-space: nowrap}
-span.roman {font-family: initial; font-weight: normal}
-span.sansserif {font-family: sans-serif; font-weight: normal}
-span:hover a.copiable-anchor {visibility: visible}
-ul.no-bullet {list-style: none}
+a.copiable-link {visibility: hidden; text-decoration: none; line-height: 0em}
+span:hover a.copiable-link {visibility: visible}
+ul.mark-bullet {list-style-type: disc}
-->
</style>
<link rel="stylesheet" type="text/css"
href="https://www.gnu.org/software/gnulib/manual.css">
@@ -54,139 +42,139 @@
</head>
<body lang="en">
-<div class="section" id="Problematic-Expressions">
-<div class="header">
+<div class="section-level-extent" id="Problematic-Expressions">
+<div class="nav-panel">
<p>
Next: <a href="Character-Encoding.html" accesskey="n" rel="next">Character
Encoding</a>, Previous: <a href="Basic-vs-Extended.html" accesskey="p"
rel="prev">Basic vs Extended Regular Expressions</a>, Up: <a
href="Regular-Expressions.html" accesskey="u" rel="up">Regular Expressions</a>
[<a href="index.html#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="Index.html" title="Index"
rel="index">Index</a>]</p>
</div>
<hr>
-<span id="Problematic-Regular-Expressions"></span><h3 class="section">3.7
Problematic Regular Expressions</h3>
+<h3 class="section" id="Problematic-Regular-Expressions"><span>3.7 Problematic
Regular Expressions<a class="copiable-link"
href="#Problematic-Regular-Expressions"> ¶</a></span></h3>
-<span id="index-invalid-regular-expressions"></span>
-<span id="index-unspecified-behavior-in-regular-expressions"></span>
-<p>Some strings are <em>invalid regular expressions</em> and cause
-<code>grep</code> to issue a diagnostic and fail. For example,
‘<samp>xy\1</samp>’
+<a class="index-entry-id" id="index-invalid-regular-expressions"></a>
+<a class="index-entry-id"
id="index-unspecified-behavior-in-regular-expressions"></a>
+<p>Some strings are <em class="dfn">invalid regular expressions</em> and cause
+<code class="command">grep</code> to issue a diagnostic and fail. For
example, ‘<samp class="samp">xy\1</samp>’
is invalid because there is no parenthesized subexpression for the
-back-reference ‘<samp>\1</samp>’ to refer to.
+back-reference ‘<samp class="samp">\1</samp>’ to refer to.
</p>
-<p>Also, some regular expressions have <em>unspecified behavior</em> and
-should be avoided even if <code>grep</code> does not currently diagnose
-them. For example, ‘<samp>xy\0</samp>’ has unspecified behavior
because
-‘<samp>0</samp>’ is not a special character and
‘<samp>\0</samp>’ is not a special
-backslash expression (see <a href="Special-Backslash-Expressions.html">Special
Backslash Expressions</a>).
+<p>Also, some regular expressions have <em class="dfn">unspecified
behavior</em> and
+should be avoided even if <code class="command">grep</code> does not currently
diagnose
+them. For example, ‘<samp class="samp">xy\0</samp>’ has
unspecified behavior because
+‘<samp class="samp">0</samp>’ is not a special character and
‘<samp class="samp">\0</samp>’ is not a special
+backslash expression (see <a class="pxref"
href="Special-Backslash-Expressions.html">Special Backslash Expressions</a>).
Unspecified behavior can be particularly problematic because the set
of matched strings might be only partially specified, or not be
specified at all, or the expression might even be invalid.
</p>
<p>The following regular expression constructs are invalid on all
platforms conforming to POSIX, so portable scripts can assume that
-<code>grep</code> rejects these constructs:
+<code class="command">grep</code> rejects these constructs:
</p>
-<ul>
-<li> A basic regular expression containing a back-reference
‘<samp>\<var>n</var></samp>’
-preceded by fewer than <var>n</var> closing parentheses. For example,
-‘<samp>\(a\)\2</samp>’ is invalid.
-
-</li><li> A bracket expression containing ‘<samp>[:</samp>’ that
does not start a
-character class; and similarly for ‘<samp>[=</samp>’ and
‘<samp>[.</samp>’. For
-example, ‘<samp>[a[:b]</samp>’ and
‘<samp>[a[:ouch:]b]</samp>’ are invalid.
+<ul class="itemize mark-bullet">
+<li>A basic regular expression containing a back-reference ‘<samp
class="samp">\<var class="var">n</var></samp>’
+preceded by fewer than <var class="var">n</var> closing parentheses. For
example,
+‘<samp class="samp">\(a\)\2</samp>’ is invalid.
+
+</li><li>A bracket expression containing ‘<samp
class="samp">[:</samp>’ that does not start a
+character class; and similarly for ‘<samp class="samp">[=</samp>’
and ‘<samp class="samp">[.</samp>’. For
+example, ‘<samp class="samp">[a[:b]</samp>’ and ‘<samp
class="samp">[a[:ouch:]b]</samp>’ are invalid.
</li></ul>
-<p>GNU <code>grep</code> treats the following constructs as invalid.
-However, other <code>grep</code> implementations might allow them, so
+<p>GNU <code class="command">grep</code> treats the following constructs as
invalid.
+However, other <code class="command">grep</code> implementations might allow
them, so
portable scripts should not rely on their being invalid:
</p>
-<ul>
-<li> Unescaped ‘<samp>\</samp>’ at the end of a regular expression.
+<ul class="itemize mark-bullet">
+<li>Unescaped ‘<samp class="samp">\</samp>’ at the end of a
regular expression.
-</li><li> Unescaped ‘<samp>[</samp>’ that does not start a bracket
expression.
+</li><li>Unescaped ‘<samp class="samp">[</samp>’ that does not
start a bracket expression.
-</li><li> A ‘<samp>\{</samp>’ in a basic regular expression that
does not start an
+</li><li>A ‘<samp class="samp">\{</samp>’ in a basic regular
expression that does not start an
interval expression.
-</li><li> A basic regular expression with unbalanced
‘<samp>\(</samp>’ or ‘<samp>\)</samp>’,
-or an extended regular expression with unbalanced ‘<samp>(</samp>’.
+</li><li>A basic regular expression with unbalanced ‘<samp
class="samp">\(</samp>’ or ‘<samp class="samp">\)</samp>’,
+or an extended regular expression with unbalanced ‘<samp
class="samp">(</samp>’.
-</li><li> In the POSIX locale, a range expression like
‘<samp>z-a</samp>’ that
-represents zero elements. A non-GNU <code>grep</code> might treat it as
+</li><li>In the POSIX locale, a range expression like ‘<samp
class="samp">z-a</samp>’ that
+represents zero elements. A non-GNU <code class="command">grep</code> might
treat it as
a valid range that never matches.
-</li><li> An interval expression with a repetition count greater than 32767.
+</li><li>An interval expression with a repetition count greater than 32767.
(The portable POSIX limit is 255, and even interval expressions with
smaller counts can be impractically slow on all known implementations.)
-</li><li> A bracket expression that contains at least three elements, the first
-and last of which are both ‘<samp>:</samp>’, or both
‘<samp>.</samp>’, or both
-‘<samp>=</samp>’. For example, a non-GNU <code>grep</code> might
treat
-‘<samp>[:alpha:]</samp>’ like
‘<samp>[[:alpha:]]</samp>’, or like
‘<samp>[:ahlp]</samp>’.
+</li><li>A bracket expression that contains at least three elements, the first
+and last of which are both ‘<samp class="samp">:</samp>’, or both
‘<samp class="samp">.</samp>’, or both
+‘<samp class="samp">=</samp>’. For example, a non-GNU <code
class="command">grep</code> might treat
+‘<samp class="samp">[:alpha:]</samp>’ like ‘<samp
class="samp">[[:alpha:]]</samp>’, or like ‘<samp
class="samp">[:ahlp]</samp>’.
</li></ul>
<p>The following constructs have well-defined behavior in GNU
-<code>grep</code>. However, they have unspecified behavior elsewhere, so
+<code class="command">grep</code>. However, they have unspecified behavior
elsewhere, so
portable scripts should avoid them:
</p>
-<ul>
-<li> Special backslash expressions like ‘<samp>\b</samp>’,
‘<samp>\<</samp>’, and ‘<samp>\]</samp>’.
-See <a href="Special-Backslash-Expressions.html">Special Backslash
Expressions</a>.
+<ul class="itemize mark-bullet">
+<li>Special backslash expressions like ‘<samp
class="samp">\b</samp>’, ‘<samp class="samp">\<</samp>’,
and ‘<samp class="samp">\]</samp>’.
+See <a class="xref" href="Special-Backslash-Expressions.html">Special
Backslash Expressions</a>.
-</li><li> A basic regular expression that uses ‘<samp>\?</samp>’,
‘<samp>\+</samp>’, or ‘<samp>\|</samp>’.
+</li><li>A basic regular expression that uses ‘<samp
class="samp">\?</samp>’, ‘<samp class="samp">\+</samp>’, or
‘<samp class="samp">\|</samp>’.
-</li><li> An extended regular expression that uses back-references.
+</li><li>An extended regular expression that uses back-references.
-</li><li> An empty regular expression, subexpression, or alternative. For
-example, ‘<samp>(a|bc|)</samp>’ is not portable; a portable
equivalent is
-‘<samp>(a|bc)?</samp>’.
+</li><li>An empty regular expression, subexpression, or alternative. For
+example, ‘<samp class="samp">(a|bc|)</samp>’ is not portable; a
portable equivalent is
+‘<samp class="samp">(a|bc)?</samp>’.
-</li><li> In a basic regular expression, an anchoring
‘<samp>^</samp>’ that appears
-directly after ‘<samp>\(</samp>’, or an anchoring
‘<samp>$</samp>’ that appears
-directly before ‘<samp>\)</samp>’.
+</li><li>In a basic regular expression, an anchoring ‘<samp
class="samp">^</samp>’ that appears
+directly after ‘<samp class="samp">\(</samp>’, or an anchoring
‘<samp class="samp">$</samp>’ that appears
+directly before ‘<samp class="samp">\)</samp>’.
-</li><li> In a basic regular expression, a repetition operator that
+</li><li>In a basic regular expression, a repetition operator that
directly follows another repetition operator.
-</li><li> In an extended regular expression, unescaped
‘<samp>{</samp>’
+</li><li>In an extended regular expression, unescaped ‘<samp
class="samp">{</samp>’
that does not begin a valid interval expression.
-GNU <code>grep</code> treats the ‘<samp>{</samp>’ as an ordinary
character.
+GNU <code class="command">grep</code> treats the ‘<samp
class="samp">{</samp>’ as an ordinary character.
-</li><li> A null character or an encoding error in either pattern or input
data.
-See <a href="Character-Encoding.html">Character Encoding</a>.
+</li><li>A null character or an encoding error in either pattern or input data.
+See <a class="xref" href="Character-Encoding.html">Character Encoding</a>.
-</li><li> An input file that ends in a non-newline character,
-where GNU <code>grep</code> silently supplies a newline.
+</li><li>An input file that ends in a non-newline character,
+where GNU <code class="command">grep</code> silently supplies a newline.
</li></ul>
<p>The following constructs have unspecified behavior, in both GNU
-and other <code>grep</code> implementations. Scripts should avoid
+and other <code class="command">grep</code> implementations. Scripts should
avoid
them whenever possible.
</p>
-<ul>
-<li> A backslash escaping an ordinary character, unless it is a
-back-reference like ‘<samp>\1</samp>’ or a special backslash
expression like
-‘<samp>\<</samp>’ or ‘<samp>\b</samp>’. See <a
href="Special-Backslash-Expressions.html">Special Backslash Expressions</a>.
For
-example, ‘<samp>\x</samp>’ has unspecified behavior now, and a
future version
-of <code>grep</code> might specify ‘<samp>\x</samp>’ to have a new
behavior.
+<ul class="itemize mark-bullet">
+<li>A backslash escaping an ordinary character, unless it is a
+back-reference like ‘<samp class="samp">\1</samp>’ or a special
backslash expression like
+‘<samp class="samp">\<</samp>’ or ‘<samp
class="samp">\b</samp>’. See <a class="xref"
href="Special-Backslash-Expressions.html">Special Backslash Expressions</a>.
For
+example, ‘<samp class="samp">\x</samp>’ has unspecified behavior
now, and a future version
+of <code class="command">grep</code> might specify ‘<samp
class="samp">\x</samp>’ to have a new behavior.
-</li><li> A repetition operator that appears directly after an anchor, or at
the
+</li><li>A repetition operator that appears directly after an anchor, or at the
start of a complete regular expression, parenthesized subexpression,
-or alternative. For example, ‘<samp>+|^*(+a|?-b)</samp>’ has
unspecified
-behavior, whereas ‘<samp>\+|^\*(\+a|\?-b)</samp>’ is portable.
+or alternative. For example, ‘<samp
class="samp">+|^*(+a|?-b)</samp>’ has unspecified
+behavior, whereas ‘<samp class="samp">\+|^\*(\+a|\?-b)</samp>’ is
portable.
-</li><li> A range expression outside the POSIX locale. For example, in some
-locales ‘<samp>[a-z]</samp>’ might match some characters that are
not
+</li><li>A range expression outside the POSIX locale. For example, in some
+locales ‘<samp class="samp">[a-z]</samp>’ might match some
characters that are not
lowercase letters, or might not match some lowercase letters, or might
-be invalid. With GNU <code>grep</code> it is not documented whether
+be invalid. With GNU <code class="command">grep</code> it is not documented
whether
these range expressions use native code points, or use the collating
-sequence specified by the <code>LC_COLLATE</code> category, or have some
+sequence specified by the <code class="env">LC_COLLATE</code> category, or
have some
other interpretation. Outside the POSIX locale, it is portable to use
-‘<samp>[[:lower:]]</samp>’ to match a lower-case letter, or
-‘<samp>[abcdefghijklmnopqrstuvwxyz]</samp>’ to match an ASCII
lower-case
+‘<samp class="samp">[[:lower:]]</samp>’ to match a lower-case
letter, or
+‘<samp class="samp">[abcdefghijklmnopqrstuvwxyz]</samp>’ to match
an ASCII lower-case
letter.
</li></ul>
</div>
<hr>
-<div class="header">
+<div class="nav-panel">
<p>
Next: <a href="Character-Encoding.html">Character Encoding</a>, Previous: <a
href="Basic-vs-Extended.html">Basic vs Extended Regular Expressions</a>, Up: <a
href="Regular-Expressions.html">Regular Expressions</a> [<a
href="index.html#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="Index.html" title="Index"
rel="index">Index</a>]</p>
</div>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Changes to grep/manual/html_node/Problematic-Expressions.html,v,
Jim Meyering <=