emacs-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

feature/tree-sitter 773cce640f: Resolve FIXME's in tree-sitter manual se


From: Yuan Fu
Subject: feature/tree-sitter 773cce640f: Resolve FIXME's in tree-sitter manual sections
Date: Sat, 22 Oct 2022 21:51:32 -0400 (EDT)

branch: feature/tree-sitter
commit 773cce640fc5d67cb1a64622defa073d7ec5fcc4
Author: Yuan Fu <casouri@gmail.com>
Commit: Yuan Fu <casouri@gmail.com>

    Resolve FIXME's in tree-sitter manual sections
    
    Pattern vs query: a query consists of many patterns.  I tightened up
    the use of pattern vs query in the manual, now there shouldn't be
    ambiguities.
    
    * doc/lispref/modes.texi (Parser-based Font Lock):
    * doc/lispref/parsing.texi (Language Definitions): Resolve FIXME's.
---
 doc/lispref/modes.texi   | 138 ++++++++++++++++++++++-------------------------
 doc/lispref/parsing.texi |  75 +++++++++++++-------------
 2 files changed, 103 insertions(+), 110 deletions(-)

diff --git a/doc/lispref/modes.texi b/doc/lispref/modes.texi
index 24892077d1..3537d312f2 100644
--- a/doc/lispref/modes.texi
+++ b/doc/lispref/modes.texi
@@ -3904,10 +3904,17 @@ variables with regexp-based font lock, it uses similar 
customization
 schemes.  The tree-sitter counterpart of @var{font-lock-keywords} is
 @var{treesit-font-lock-settings}.
 
-@c FIXME: The ``query'' part here and thereafter comes ``out of the
-@c blue''.  There should be some text here explaining what those
-@c ``queries'' are and how are they related to fontifications, or a
-@c cross-reference to another place with such an explanation.
+In general, tree-sitter fontification works like the following: a Lisp
+program provides a @dfn{query} consisting of @dfn{patterns} with
+@dfn{capture names}.  Tree-sitter finds the nodes in the parse tree
+that match these patterns, tags the corresponding capture names onto
+the nodes, and returns them to the Lisp program.  The Lisp program
+takes theses nodes and highlights the corresponding buffer text of
+each node depending on the tagged capture name of the node.  For
+example, a node tagged @code{font-lock-keyword} would simply be
+highlighted in @code{font-lock-keyword} face.  For more information on
+queries, patterns and capture names, @pref{Pattern Matching}.
+
 @defun treesit-font-lock-rules :keyword value query...
 This function is used to set @var{treesit-font-lock-settings}.  It
 takes care of compiling queries and other post-processing, and outputs
@@ -3948,9 +3955,10 @@ Other keywords are optional:
 @item @tab @code{keep} @tab Fill-in regions without an existing face
 @end multitable
 
-@c FIXME: The ``capture names'' part should be expl,ained before it is
-@c first used: what it is and how it's related to fontifications.
-Capture names in @var{query} should be face names like
+Lisp programs mark patterns in the query with capture names (names
+that starts with @code{@@}), and tree-sitter will return matched nodes
+with capture names tagged onto them.  For the purpose of
+fontification, capture names in @var{query} should be face names like
 @code{font-lock-keyword-face}.  The captured node will be fontified
 with that face.  Capture names can also be function names, in which
 case the function is called with 3 arguments: @var{start}, @var{end},
@@ -3966,9 +3974,8 @@ is a list that represents a decoration level.
 @code{font-lock-maximum-decoration} controls which levels are
 activated.
 
-@c FIXME: This should be rewritten using our style: ``each element of
-@c the list is a list of the form (FOO BAR BAZ), where FOO...'' etc.
-Inside each sublist are feature symbols, which correspond to the
+Each element of the list is a list of the form @w{@code{(@var{feature}
+@dots{})}}, where each @var{feature} corresponds to the
 @code{:feature} value of a query defined in
 @code{treesit-font-lock-rules}.  Removing a feature symbol from this
 list disables the corresponding query during font-lock.
@@ -3992,40 +3999,18 @@ For example, the value of this variable could be:
 Major modes should set this variable before calling
 @code{treesit-font-lock-enable}.
 
-@c FIXME: ``for further changes''?  This should clarify when this
-@c function has to be called.
 @findex treesit-font-lock-recompute-features
-In addition, for further changes to this variable to take effect, call
-@code{treesit-font-lock-recompute-features}.
+For this variable to take effect, a Lisp program should call
+@code{treesit-font-lock-recompute-features} (which resets
+@code{treesit-font-lock-settings} accordingly).
 @end defvar
 
 @defvar treesit-font-lock-settings
 A list of settings for tree-sitter based font lock.  The exact format
 of this variable is considered internal.  One should always use
 @code{treesit-font-lock-rules} to set this variable.
-
-@c FIXME: If the format is considered ``internal'', why do we need to
-@c describe it here?
-Each @var{setting} is of form
-
-@example
-(@var{query} @var{enable} @var{feature} @var{override})
-@end example
-
-@var{query} must be a compiled query (@pxref{Pattern Matching}).
-
-For @var{setting} to be activated for font-lock, @var{enable} must be
-@code{t}.  To disable this @var{setting}, set @var{enable} to
-@code{nil}.
-
-@var{feature} is the ``feature name'' of the query, users can control
-which features are enabled with @code{font-lock-maximum-decoration}
-and @code{treesit-font-lock-feature-list}.
-
-@var{override} is the override flag for this query.  Its value can be
-@code{t}, @code{nil}, @code{append}, @code{prepend}, or @code{keep}.
-@c FIXME: See where?
-See more in @code{treesit-font-lock-rules}.
+@c Because the format is internal, we don't document them here.
+@c Though We do have explanations in the docstring.
 @end defvar
 
 Multi-language major modes should provide range functions in
@@ -4790,27 +4775,26 @@ a list of the form: @w{@code{(@var{language} . 
@var{rules})}}, where
 @var{language} is a language symbol, and @var{rules} is a list of the
 form @w{@code{(@var{matcher} @var{anchor} @var{offset})}}.
 
-@c FIXME: ``node''?
-First, Emacs passes the node at point to @var{matcher}; if it returns
-non-@code{nil}, this rule is applicable.  Then Emacs passes the node
-to @var{anchor}, which returns a buffer position.  Emacs takes the
-column number of that position, adds @var{offset} to it, and the
-result is the indentation column for the current line.
+First, Emacs passes the smallest tree-sitter node at the beginning of
+the current line to @var{matcher}; if it returns non-@code{nil}, this
+rule is applicable.  Then Emacs passes the node to @var{anchor}, which
+returns a buffer position.  Emacs takes the column number of that
+position, adds @var{offset} to it, and the result is the indentation
+column for the current line.
 
 The @var{matcher} and @var{anchor} are functions, and Emacs provides
 convenient defaults for them.
 
-@c FIXME: Clarify the following description.  In particular, how to
-@c find/compute ``the largest node'' and its ``parent''?
 Each @var{matcher} or @var{anchor} is a function that takes three
 arguments: @var{node}, @var{parent}, and @var{bol}.  The argument
 @var{bol} is the buffer position whose indentation is required: the
 position of the first non-whitespace character after the beginning of
 the line.  The argument @var{node} is the largest (highest-in-tree)
 node that starts at that position; and @var{parent} is the parent of
-@var{node}.  @var{matcher} should return non-@code{nil} if the rule is
-applicable, and @var{anchor} should return a buffer position that is
-the basis of the indentation.
+@var{node}.  Emacs finds @var{bol}, @var{node} and @var{parent} and
+passes them to each @var{matcher} and @var{anchor}.  @var{matcher}
+should return non-@code{nil} if the rule is applicable, and
+@var{anchor} should return a buffer position.
 @end defvar
 
 @defvar treesit-simple-indent-presets
@@ -4821,63 +4805,69 @@ available default functions are:
 
 @ftable @code
 @item no-node
-This matcher is a symbol that matches the case where @var{node} is
+This matcher is a function that matches the case where @var{node} is
 @code{nil}, i.e., there is no node that starts at @var{bol}.  This is
 the case when @var{bol} is on an empty line or inside a multi-line
 string, etc.
 
 @item parent-is
-This matcher is a function of one argument, @var{type}; it matches if
-the type of the parent node is @var{type}.
+This matcher is a function of one argument, @var{type}; it return a
+function that given @w{@code{(@var{node} @var{parent} @var{bol})}},
+matches if @var{parent}'s type is @var{type}.
 
 @item node-is
-This matcher is a function of one argument, @var{type}; it matches if
-the node's type is @var{type}.
+This matcher is a function of one argument, @var{type}; it returns a
+function that given @w{@code{(@var{node} @var{parent} @var{bol})}},
+matches if @var{node}'s type is @var{type}.
 
-@c FIXME: The description of this matcher is unclear.  What is
-@c ``parent'' and what does it mean ``captures NODE''?
 @item query
-This matcher is a function of one argument, @var{query}; it matches if
-querying @var{parent} with @var{query} captures @var{node}.  The
-capture name does not matter.   @c Why is this bit important?
+This matcher is a function of one argument, @var{query}; it returns a
+function that given @w{@code{(@var{node} @var{parent} @var{bol})}},
+matches if querying @var{parent} with @var{query} captures @var{node}
+(@pxref{Pattern Matching}).
 
 @item match
 This matcher is a function of 5 arguments: @var{node-type},
 @var{parent-type}, @var{node-field}, @var{node-index-min}, and
-@var{node-index-max}).  It matches if @var{node}'s type is @var{node-type},
-@var{parent}'s type is @var{parent-type}, @var{node}'s field name in
-@var{parent} is @var{node-field}, and @var{node}'s index among its
-siblings is between @var{node-index-min} and @var{node-index-max}.  If
-@c FIXME: ``constraint''?
-the value of a constraint is nil, this matcher doesn't check for that
-constraint.  For example, to match the first child where parent is
+@var{node-index-max}).  It returns a function that given
+@w{@code{(@var{node} @var{parent} @var{bol})}}, matches if
+@var{node}'s type is @var{node-type}, @var{parent}'s type is
+@var{parent-type}, @var{node}'s field name in @var{parent} is
+@var{node-field}, and @var{node}'s index among its siblings is between
+@var{node-index-min} and @var{node-index-max}.  If the value of an
+argument is @code{nil}, this matcher doesn't check for that argument.
+For example, to match the first child where parent is
 @code{argument_list}, use
 
 @example
 (match nil "argument_list" nil nil 0 0)
 @end example
 
-@c FIXME: ``PARENT''? is that an argument of the anchor function
 @item first-sibling
-This anchor returns the start of the first child of @var{parent}.
+This anchor is a function that given @w{@code{(@var{node} @var{parent}
+@var{bol})}}, returns the start of the first child of @var{parent}.
 
 @item parent
-This anchor returns the start of @var{parent}. @c FIXME: Likewise.
+This anchor is a function that given @w{@code{(@var{node} @var{parent}
+@var{bol})}}, returns the start of @var{parent}.
 
 @item parent-bol
-This anchor returns the first non-space character on the line of
+This anchor is a function that given @w{@code{(@var{node} @var{parent}
+@var{bol})}}, returns the first non-space character on the line of
 @var{parent}.
 
-@c FIXME: ``NODE''?
 @item prev-sibling
-This anchor returns the start of the previous sibling of @var{node}.
+This anchor is a function that given @w{@code{(@var{node} @var{parent}
+@var{bol})}}, returns the start of the previous sibling of @var{node}.
 
 @item no-indent
-This anchor returns the start of @var{node}, i.e., no indent. @c ???
+This anchor is a function that given @w{@code{(@var{node} @var{parent}
+@var{bol})}}, returns the start of @var{node}.
 
 @item prev-line
-This anchor returns the first non-whitespace charater on the previous
-line.
+This anchor is a function that given @w{@code{(@var{node} @var{parent}
+@var{bol})}}, returns the first non-whitespace charater on the
+previous line.
 @end ftable
 
 @end defvar
diff --git a/doc/lispref/parsing.texi b/doc/lispref/parsing.texi
index 9079e0f781..502a0e4f26 100644
--- a/doc/lispref/parsing.texi
+++ b/doc/lispref/parsing.texi
@@ -95,7 +95,7 @@ This means Emacs could not find the language definition 
library.
 @item (symbol-error @var{error-msg})
 This means Emacs could not find in the library the expected function
 that every language definition library should export.
-@item (version_mismatch @var{error-msg})
+@item (version-mismatch @var{error-msg})
 This means the version of language definition library is incompatible
 with that of the tree-sitter library.
 @end table
@@ -253,7 +253,7 @@ syntax tree effectively, you need to consult the 
@dfn{grammar file}.
 The grammar file is usually @file{grammar.js} in a language
 definition's project repository.  The link to a language definition's
 home page can be found on
-@uref{https://tree-sitter.github.io/tree-sitter, the tree-sitter's
+@uref{https://tree-sitter.github.io/tree-sitter, tree-sitter's
 homepage}.
 
 The grammar definition is written in JavaScript.  For example, the
@@ -405,11 +405,11 @@ returns non-@code{nil} if it is, @code{nil} otherwise.
 @end defun
 
 There is no need to explicitly parse a buffer, because parsing is done
-automatically and lazily.  A parser only parses when the mode queris
-for a node in its syntax tree.  Therefore, when a parser is first
-created, it doesn't parse the buffer; it waits until the mode queries
-for a node for the first time.  Similarly, when some change is made in
-the buffer, a parser doesn't re-parse immediately.
+automatically and lazily.  A parser only parses when a Lisp program
+queris for a node in its syntax tree.  Therefore, when a parser is
+first created, it doesn't parse the buffer; it waits until the Lisp
+program queries for a node for the first time.  Similarly, when some
+change is made in the buffer, a parser doesn't re-parse immediately.
 
 @vindex treesit-buffer-too-large
 When a parser does parse, it checks for the size of the buffer.
@@ -510,7 +510,7 @@ Example:
 @group
 ;; Find the node at point in a C parser's syntax tree.
 (treesit-node-at (point) 'c)
-  @result{} #<treesit-node from 1 to 4 in *scratch*>
+  @result{} #<treesit-node (primitive_type) in *scratch*>
 @end group
 @end example
 @end defun
@@ -606,7 +606,7 @@ This function finds the child of @var{node} whose field 
name is
 @group
 ;; Get the child that has "body" as its field name.
 (treesit-child-by-field-name node "body")
-  @result{} #<treesit-node from 3 to 11 in *scratch*>
+  @result{} #<treesit-node (compound_statement) in *scratch*>
 @end group
 @end example
 @end defun
@@ -644,20 +644,24 @@ does.
 
 By default, this function only traverses named nodes, but if @var{all}
 is non-@code{nil}, it traverses all the nodes.  If @var{backward} is
-@c FIXME: What does it mean to ``traverse backward''?
-non-nil, it traverses backwards.  If @var{limit} is non-@code{nil}, it
+non-nil, it traverses backwards (meaning visiting the last child first
+when traversing down the tree).  If @var{limit} is non-@code{nil}, it
 must be a number that limits the tree traversal to that many levels
 down the tree.
 @end defun
 
 @defun treesit-search-forward start predicate &optional all backward up
-@c FIXME: Explain better what is the differencve between this function
-@c and the previous one.
-This function is somewhat similar to @code{treesit-search-subtree}.
-It also traverse the parse tree and matches each node with
-@var{predicate} (except for @var{start}), where @var{predicate} can be
-a (case-insensitive) regexp or a function.  For a tree like the below
-where @var{start} is marked 1, this function traverses as numbered:
+While @code{treesit-search-subtree} traverses the subtree of a node,
+this function usually starts with a leaf node and traverses every node
+comes after it in terms of buffer position.  It is useful for
+answering questions like ``what is the first node after @var{start} in
+the buffer that satisfies some condition?''
+
+Like @code{treesit-search-subtree}, this function also traverse the
+parse tree and matches each node with @var{predicate} (except for
+@var{start}), where @var{predicate} can be a (case-insensitive) regexp
+or a function.  For a tree like the below where @var{start} is marked
+1, this function traverses as numbered:
 
 @example
 @group
@@ -830,7 +834,7 @@ is not yet in its final form.
 
 @cindex tree-sitter extra node
 @cindex extra node, tree-sitter
-A node can be ``extra'': extra nodes represent things like comments,
+A node can be ``extra'': such nodes represent things like comments,
 which can appear anywhere in the text.
 
 @cindex tree-sitter node that has changes
@@ -1007,9 +1011,9 @@ root node with @var{query}, and returns the result.
 
 @heading More query syntax
 
-Besides node type and capture, tree-sitter's query syntax can express
-anonymous node, field name, wildcard, quantification, grouping,
-alternation, anchor, and predicate.
+Besides node type and capture, tree-sitter's pattern syntax can
+express anonymous node, field name, wildcard, quantification,
+grouping, alternation, anchor, and predicate.
 
 @subheading Anonymous node
 
@@ -1022,9 +1026,9 @@ pattern matching (and capturing) keyword @code{return} 
would be
 
 @subheading Wild card
 
-In a query pattern, @samp{(_)} matches any named node, and @samp{_}
-matches any named and anonymous node.  For example, to capture any
-named child of a @code{binary_expression} node, the pattern would be
+In a pattern, @samp{(_)} matches any named node, and @samp{_} matches
+any named and anonymous node.  For example, to capture any named child
+of a @code{binary_expression} node, the pattern would be
 
 @example
 (binary_expression (_) @@in_biexp)
@@ -1032,10 +1036,10 @@ named child of a @code{binary_expression} node, the 
pattern would be
 
 @subheading Field name
 
-It is possible to capture child nodes that have specific field names:
+It is possible to capture child nodes that have specific field names.
+In the pattern below, @code{declarator} and @code{body} are field
+names, indicated by the colon following them.
 
-@c FIXME: The significance of ``:'' should be explained, and also what
-@c are ``declarator'' and ``body''.
 @example
 @group
 (function_definition
@@ -1059,7 +1063,6 @@ Tree-sitter recognizes quantification operators @samp{*}, 
@samp{+} and
 @samp{*} matches the preceding pattern zero or more times, @samp{+}
 matches one or more times, and @samp{?} matches zero or one time.
 
-@c FIXME: ``pattern'' or :''query''?  Or maybe ``query pattern''?
 For example, the following pattern matches @code{type_declaration}
 nodes that has @emph{zero or more} @code{long} keyword.
 
@@ -1087,9 +1090,9 @@ express a comma separated list of identifiers, one could 
write
 @subheading Alternation
 
 Again, similar to regular expressions, we can express ``match anyone
-from this group of patterns'' in the query pattern.  The syntax is a
-list of patterns enclosed in square brackets.  For example, to capture
-some keywords in C, the query pattern would be
+from this group of patterns'' in a pattern.  The syntax is a list of
+patterns enclosed in square brackets.  For example, to capture some
+keywords in C, the pattern would be
 
 @example
 @group
@@ -1136,7 +1139,7 @@ nodes.
 @subheading Predicate
 
 It is possible to add predicate constraints to a pattern.  For
-example, with the following query pattern:
+example, with the following pattern:
 
 @example
 @group
@@ -1170,11 +1173,11 @@ names in other patterns.
 
 @heading S-expression patterns
 
-@cindex query patterns as sexps
+@cindex patterns as sexps
 @cindex patterns, tree-sitter, in sexp form
-Besides strings, Emacs provides a s-expression based syntax for query
+Besides strings, Emacs provides a s-expression based syntax for
 patterns.  It largely resembles the string-based syntax.  For example,
-the following pattern
+the following query
 
 @example
 @group



reply via email to

[Prev in Thread] Current Thread [Next in Thread]