From 72bdacccbd3e6cc3eb6e16549cf51ea9e7321ae2 Mon Sep 17 00:00:00 2001 From: Reuben Thomas Date: Wed, 11 May 2022 11:47:00 +0100 Subject: [PATCH] doc/regex.texi: remove Emacs-specific documentation; match code MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Remove mention of both Emacs and non-Emacs syntax tables, as these are no longer supported by the code; instead, fixed character classes are used. Document the word character class (alnum + _). Replace mentions of #defining emacs with RE_NO_GNU_OPS (which takes effect in the opposite sense); merge the node “GNU Emacs Operators” into “GNU Operators”. For \` and \', refer to the “whole string” rather than the (Emacs) “buffer”. Leave a TODO to document the classes that can be used with \s and \S. (This was not previously documented, and is best left to another commit.) --- doc/regex.texi | 113 +++++++++++++------------------------------------ 1 file changed, 30 insertions(+), 83 deletions(-) diff --git a/doc/regex.texi b/doc/regex.texi index d21052282d..7015c8a651 100644 --- a/doc/regex.texi +++ b/doc/regex.texi @@ -108,8 +108,8 @@ Compiling}, for more information on compiling. Regex considers the current syntax to be a collection of bits; we refer to these bits as @dfn{syntax bits}. In most cases, they affect what characters represent what operators. We describe the meanings of the -operators to which we refer in @ref{Common Operators}, @ref{GNU -Operators}, and @ref{GNU Emacs Operators}. +operators to which we refer in @ref{Common Operators}, and @ref{GNU +Operators}. For reference, here is the complete list of syntax bits, in alphabetical order: @@ -467,15 +467,15 @@ cases @code{RE_BK_PLUS_QM}, @code{RE_NO_BK_BRACES}, @code{RE_NO_BK_VAR}, (@pxref{Match-non-word-constituent Operator}). @item -@samp{\`} represents the match-beginning-of-buffer -operator and @samp{\'} represents the match-end-of-buffer operator -(@pxref{Buffer Operators}). +@samp{\`} represents the match-beginning-of-string +operator and @samp{\'} represents the match-end-of-string operator +(@pxref{Whole-string Operators}). @item -If Regex was compiled with the C preprocessor symbol @code{emacs} -defined, then @samp{\s@var{class}} represents the match-syntactic-class -operator and @samp{\S@var{class}} represents the -match-not-syntactic-class operator (@pxref{Syntactic Class Operators}). +@samp{\s@var{class}} represents the match-syntactic-class operator and +@samp{\S@var{class}} represents the match-not-syntactic-class operator +(@pxref{Syntactic Class Operators}), unless the syntax bit +@code{RE_NO_GNU_OPS} is set. @end itemize @@ -1243,22 +1243,24 @@ exactly the dual of @samp{^}'s; see the previous section. (That is, @node GNU Operators @chapter GNU Operators -Following are operators that GNU defines (and POSIX doesn't). +Following are operators that GNU defines (and POSIX doesn't) that you +can use unless the syntax bit @code{RE_NO_GNU_OPS} is set. @menu * Word Operators:: -* Buffer Operators:: +* Whole-string Operators:: @end menu @node Word Operators @section Word Operators The operators in this section require Regex to recognize parts of words. -Regex uses a syntax table to determine whether or not a character is -part of a word, i.e., whether or not it is @dfn{word-constituent}. +Characters that are part of words, which are called +@dfn{word-constituent}, are letters, digits, and the underscore +(@samp{_}); more precisely, any character in the POSIX class +@code{alnum} in the current locale, or underscore. @menu -* Non-Emacs Syntax Tables:: * Match-word-boundary Operator:: \b * Match-within-word Operator:: \B * Match-beginning-of-word Operator:: \< @@ -1267,34 +1269,6 @@ part of a word, i.e., whether or not it is @dfn{word-constituent}. * Match-non-word-constituent Operator:: \W @end menu -@node Non-Emacs Syntax Tables -@subsection Non-Emacs Syntax Tables - -A @dfn{syntax table} is an array indexed by the characters in your -character set. In the ASCII encoding, therefore, a syntax table -has 256 elements. Regex always uses a @code{char *} variable -@code{re_syntax_table} as its syntax table. In some cases, it -initializes this variable and in others it expects you to initialize it. - -@itemize @bullet -@item -If Regex is compiled with the preprocessor symbols @code{emacs} and -@code{SYNTAX_TABLE} both undefined, then Regex allocates -@code{re_syntax_table} and initializes an element @var{i} either to -@code{Sword} (which it defines) if @var{i} is a letter, number, or -@samp{_}, or to zero if it's not. - -@item -If Regex is compiled with @code{emacs} undefined but @code{SYNTAX_TABLE} -defined, then Regex expects you to define a @code{char *} variable -@code{re_syntax_table} to be a valid syntax table. - -@item -@xref{Emacs Syntax Tables}, for what happens when Regex is compiled with -the preprocessor symbol @code{emacs} defined. - -@end itemize - @node Match-word-boundary Operator @subsection The Match-word-boundary Operator (@code{\b}) @@ -1347,74 +1321,47 @@ This operator (represented by @samp{\W}) matches any character that is not word-constituent. -@node Buffer Operators -@section Buffer Operators +@node Whole-string Operators +@section Whole-string Operators -Following are operators which work on buffers. In Emacs, a @dfn{buffer} -is, naturally, an Emacs buffer. For other programs, Regex considers the -entire string to be matched as the buffer. +Following are operators which work on the whole string. @menu -* Match-beginning-of-buffer Operator:: \` -* Match-end-of-buffer Operator:: \' +* Match-beginning-of-string Operator:: \` +* Match-end-of-string Operator:: \' +* Syntactic Class Operators:: @end menu -@node Match-beginning-of-buffer Operator -@subsection The Match-beginning-of-buffer Operator (@code{\`}) +@node Match-beginning-of-string Operator +@subsection The Match-beginning-of-string Operator (@code{\`}) @cindex @samp{\`} This operator (represented by @samp{\`}) matches the empty string at the -beginning of the buffer. +beginning of the string. -@node Match-end-of-buffer Operator -@subsection The Match-end-of-buffer Operator (@code{\'}) +@node Match-end-of-string Operator +@subsection The Match-end-of-string Operator (@code{\'}) @cindex @samp{\'} This operator (represented by @samp{\'}) matches the empty string at the -end of the buffer. - - -@node GNU Emacs Operators -@chapter GNU Emacs Operators - -Following are operators that GNU defines (and POSIX doesn't) -that you can use only when Regex is compiled with the preprocessor -symbol @code{emacs} defined. - -@menu -* Syntactic Class Operators:: -@end menu +end of the string. @node Syntactic Class Operators @section Syntactic Class Operators The operators in this section require Regex to recognize the syntactic -classes of characters. Regex uses a syntax table to determine this. +classes of characters. +@c TODO: What are the valid classes? @menu -* Emacs Syntax Tables:: * Match-syntactic-class Operator:: \sCLASS * Match-not-syntactic-class Operator:: \SCLASS @end menu -@node Emacs Syntax Tables -@subsection Emacs Syntax Tables - -A @dfn{syntax table} is an array indexed by the characters in your -character set. In the ASCII encoding, therefore, a syntax table -has 256 elements. - -If Regex is compiled with the preprocessor symbol @code{emacs} defined, -then Regex expects you to define and initialize the variable -@code{re_syntax_table} to be an Emacs syntax table. Emacs' syntax -tables are more complicated than Regex's own (@pxref{Non-Emacs Syntax -Tables}). @xref{Syntax, , Syntax, emacs, The GNU Emacs User's Manual}, -for a description of Emacs' syntax tables. - @node Match-syntactic-class Operator @subsection The Match-syntactic-class Operator (@code{\s}@var{class}) -- 2.25.1