[PATCH 14/17] multistart: use b4_accept instead of action post-processin

bison-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 14/17] multistart: use b4_accept instead of action post-processin

From:	Akim Demaille
Subject:	[PATCH 14/17] multistart: use b4_accept instead of action post-processing
Date:	Sun, 20 Sep 2020 10:37:46 +0200

For each start symbol, generate a parsing function with a richer
return value that the usual of yyparse.  Reserve a place for the
returned semantic value, in order to avoid having to pass a pointer as
argument to "return" that value.  This also makes the call to the
parsing function independent of whether a given start-symbol is typed.

For instance, if the grammar file contains:

    %type <int> expression
    %start input expression

(so "input" is valueless) we get

    typedef struct
    {
      int yystatus;
    } yyparse_input_t;

    yyparse_input_t yyparse_input (void);

    typedef struct
    {
      int yyvalue;
      int yystatus;
    } yyparse_expression_t;

    yyparse_expression_t yyparse_expression (void);

This commit also changes the implementation of the parser termination:
when there are multiple start symbols, it is the initial rules that
explicitly YYACCEPT.  They do that after having exported the
start-symbol's value (if it is typed):

  switch (yyn)
    {
  case 1: /* $accept: YY_EXPRESSION expression $end  */
  { ((*yyvalue).TOK_expression) = (yyvsp[-1].TOK_expression); YYACCEPT; }
    break;

  case 2: /* $accept: YY_INPUT input $end  */
  { YYACCEPT; }
    break;

I have tried several ways to deal with termination, and this is the
one that appears the best one to me.  It is also the most natural.

* src/scan-code.h, src/scan-code.l (obstack_for_actions): New.
* src/reader.c (grammar_rule_check_and_complete): Generate the actions
of the rules for each start symbol.

* data/skeletons/bison.m4 (b4_symbol_slot): New, with safer semantics
than type and type_tag.
* data/skeletons/yacc.c (b4_accept): New.
Generates the body of the action of the start rules.
(_b4_declare_sub_yyparse): For each start symbol define a dedicated
return type for its parsing function.
Adjust the declaration of its parsing function.
(_b4_define_sub_yyparse): Adjust the definition of the function.

* examples/c/lexcalc/parse.y: Check the case of valueless symbols.
* examples/c/lexcalc/lexcalc.test: Check start symbols.
---
 TODO                            |  4 +++
 data/README.md                  |  8 ++++-
 data/skeletons/bison.m4         | 14 ++++++++
 data/skeletons/yacc.c           | 57 ++++++++++++++++++++++++---------
 examples/c/lexcalc/lexcalc.test |  8 ++---
 examples/c/lexcalc/parse.y      | 30 +++++++++++------
 src/main.c                      |  1 +
 src/reader.c                    | 32 +++++++++++++++++-
 src/scan-code.h                 |  7 ++++
 src/scan-code.l                 | 18 +++++------
 10 files changed, 138 insertions(+), 41 deletions(-)

diff --git a/TODO b/TODO
index b7221d69..f908889c 100644
--- a/TODO
+++ b/TODO
@@ -240,6 +240,10 @@ functions.
 states/nstates, rules/nrules, ..., ritem/nritems
 Fix the latter.
 
+*** m4: slot, type, type_tag
+The meaning of type_tag varies depending on api.value.type.  We should avoid
+that and using clear definitions with stable semantics.
+
 * D programming language
 There's a number of features that are missing, here sorted in _suggested_
 order of implementation.
diff --git a/data/README.md b/data/README.md
index 50fbe525..59b220b0 100644
--- a/data/README.md
+++ b/data/README.md
@@ -142,11 +142,17 @@ The macro `b4_symbol(NUM, FIELD)` gives access to the 
following FIELDS:
   When api.value.type=union, the generated name for the union member.
   yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc.
 
-- `type`
+- `type`: string
   If it has a semantic value, its type tag, or, if variant are used,
   its type.
   In the case of api.value.type=union, type is the real type (e.g. int).
 
+- `slot`: string
+  If it has a semantic value, the name of the union member (i.e., bounces to
+  either `type_tag` or `type`).  It would be better to fix our mess and
+  always use `type` for the true type of the member, and `type_tag` for the
+  name of the union member.
+
 - `has_printer`: 0, 1
 - `printer`: string
 - `printer_file`: string
diff --git a/data/skeletons/bison.m4 b/data/skeletons/bison.m4
index 37e2b849..ff923610 100644
--- a/data/skeletons/bison.m4
+++ b/data/skeletons/bison.m4
@@ -465,6 +465,19 @@ m4_case([$1],
 # but are S_YYEMPTY and symbol_kind::S_YYEMPTY in C++.
 m4_copy([b4_symbol_kind_base], [b4_symbol_kind])
 
+
+# b4_symbol_slot(NUM)
+# -------------------
+# The name of union member that contains the value of these symbols.
+# Currently, we are messy, this should actually be type_tag, but type_tag
+# has several meanings.
+m4_define([b4_symbol_slot],
+[m4_case(b4_percent_define_get([[api.value.type]]),
+         [union],   [b4_symbol([$1], [type_tag])],
+         [variant], [b4_symbol([$1], [type_tag])],
+         [b4_symbol([$1], [type])])])
+
+
 # b4_symbol(NUM, FIELD)
 # ---------------------
 # Fetch FIELD of symbol #NUM (or "orig NUM").  Fail if undefined.
@@ -475,6 +488,7 @@ m4_define([b4_symbol],
          [id],        [b4_symbol_token_kind([$1])],
          [kind_base], [b4_symbol_kind_base([$1])],
          [kind],      [b4_symbol_kind([$1])],
+         [slot],      [b4_symbol_slot([$1])],
          [_b4_symbol($@)])])
 
 
diff --git a/data/skeletons/yacc.c b/data/skeletons/yacc.c
index 5ccd1207..e0327638 100644
--- a/data/skeletons/yacc.c
+++ b/data/skeletons/yacc.c
@@ -116,6 +116,16 @@ m4_ifset([b4_parse_param], [b4_args(b4_parse_param), ])])
 ## ----------------- ##
 
 
+# b4_accept([SYMBOL-NUM])
+# -----------------------
+# Used in actions of the rules of accept, the initial symbol, to call
+# YYACCEPT.  If SYMBOL-NUM is specified, run "yyvalue->SLOT = $2;"
+# before, using the slot of SYMBOL-NUM.
+m4_define([b4_accept],
+[m4_ifval([$1],
+          [b4_symbol_value((*yyvalue), [$1]) = b4_rhs_value(2, 1, [$1]); 
])YYACCEPT])
+
+
 # b4_lhs_value(SYMBOL-NUM, [TYPE])
 # --------------------------------
 # See README.
@@ -157,19 +167,38 @@ m4_define([b4_rhs_location],
 
 # _b4_declare_sub_yyparse(START-SYMBOL-NUM)
 # -----------------------------------
+# Define the return type of the parsing function for SYMBOL-NUM, and
+# declare its parsing function.
 m4_define([_b4_declare_sub_yyparse],
-[[int ]b4_prefix[parse_]_b4_symbol($1, id)[ (]m4_ifset([b4_parse_param], 
[b4_formals(b4_parse_param)], [void])[);]])
+[[
+// Return type when parsing one ]_b4_symbol($1, tag)[.
+typedef struct
+{]b4_symbol_if([$1], [has_type], [[
+  ]_b4_symbol($1, type)[ yyvalue;]])[
+  int yystatus;
+} ]b4_prefix[parse_]_b4_symbol($1, id)[_t;
+
+// Parse one ]_b4_symbol($1, tag)[.
+]b4_prefix[parse_]_b4_symbol($1, id)[_t ]b4_prefix[parse_]_b4_symbol($1, id)[ 
(]m4_ifset([b4_parse_param], [b4_formals(b4_parse_param)], [void])[);
+]])
 
 
 # _b4_define_sub_yyparse(START-SYMBOL-NUM, SWITCHING-TOKEN-SYMBOL-NUM)
 # --------------------------------------------------------------------
+# Define the parsing function for START-SYMBOL-NUM.
 m4_define([_b4_define_sub_yyparse],
-[[int
+[[
+]b4_prefix[parse_]_b4_symbol($1, id)[_t
 yyparse_]_b4_symbol($1, id)[ (]m4_ifset([b4_parse_param], 
[b4_formals(b4_parse_param)], [void])[)
 {
-  return yyparse_impl (]b4_symbol($2, id)[]m4_ifset([b4_parse_param],
-                                                    [[, 
]b4_args(b4_parse_param)])[);
-}]])
+  ]b4_prefix[parse_]_b4_symbol($1, id)[_t yyres;
+  YYSTYPE yyvalue;
+  yyres.yystatus = yyparse_yyimpl (]b4_symbol($2, id)[, 
&yyvalue]m4_ifset([b4_parse_param],
+                           [[, 
]b4_args(b4_parse_param)])[);]b4_symbol_if([$1], [has_type], [[
+  yyres.yyvalue = yyvalue.]b4_symbol($1, slot)[;]])[
+  return yyres;
+}
+]])
 
 
 # b4_declare_scanner_communication_variables
@@ -179,8 +208,8 @@ yyparse_]_b4_symbol($1, id)[ (]m4_ifset([b4_parse_param], 
[b4_formals(b4_parse_p
 m4_define([b4_declare_scanner_communication_variables], [[
 ]m4_ifdef([b4_start_symbols], [],
 [[/* Lookahead token kind.  */
-int yychar;]])[
-
+int yychar;
+]])[
 ]b4_pure_if([[
 /* The semantic value of the lookahead symbol.  */
 /* Default value used for initialization, for pacifying older GCCs
@@ -1560,7 +1589,7 @@ yypush_parse (yypstate *yyps]b4_pure_if([[,
 ]m4_ifdef([b4_start_symbols],
 [[
 static int
-yyparse_impl (int yychar]m4_ifset([b4_parse_param], [, 
b4_formals(b4_parse_param)])[);
+yyparse_yyimpl (int yychar, YYSTYPE *yyvalue]m4_ifset([b4_parse_param], [, 
b4_formals(b4_parse_param)])[);
 
 ]m4_map([_b4_define_sub_yyparse], m4_defn([b4_start_symbols]))[
 
@@ -1568,12 +1597,12 @@ int
 yyparse (]m4_ifset([b4_parse_param], [b4_formals(b4_parse_param)], [void])[)
 {
   /* ]b4_symbol(-2, id)[ causes a token to be read.  */
-  return yyparse_impl (]b4_symbol(-2, id)[]m4_ifset([b4_parse_param],
+  return yyparse_yyimpl (]b4_symbol(-2, id)[, 
YY_NULLPTR]m4_ifset([b4_parse_param],
                                                     [[, 
]b4_args(b4_parse_param)])[);
 }
 
 static int
-yyparse_impl (int yychar]m4_ifset([b4_parse_param], [, 
b4_formals(b4_parse_param)])[)]],
+yyparse_yyimpl (int yychar, YYSTYPE *yyvalue]m4_ifset([b4_parse_param], [, 
b4_formals(b4_parse_param)])[)]],
 [[int
 yyparse (]m4_ifset([b4_parse_param], [b4_formals(b4_parse_param)], 
[void])[)]])])[
 {]b4_pure_if([b4_declare_scanner_communication_variables
@@ -1812,9 +1841,7 @@ yyread_pushed_token:]])[
     {
       if (yytable_value_is_error (yyn))
         goto yyerrlab;
-      yyn = -yyn;]m4_ifdef([b4_start_symbols], [[
-      if (yyr1[yyn] == YYNTOKENS)
-        YYACCEPT;]])[]b4_lac_if([[
+      yyn = -yyn;]b4_lac_if([[
       YY_LAC_ESTABLISH;]])[
       goto yyreduce;
     }
@@ -1844,9 +1871,7 @@ yyread_pushed_token:]])[
 yydefault:
   yyn = yydefact[yystate];
   if (yyn == 0)
-    goto yyerrlab;]m4_ifdef([b4_start_symbols], [[
-  else if (yyr1[yyn] == YYNTOKENS)
-    YYACCEPT;]])[
+    goto yyerrlab;
   goto yyreduce;
 
 
diff --git a/examples/c/lexcalc/lexcalc.test b/examples/c/lexcalc/lexcalc.test
index ef02f6c1..794676de 100644
--- a/examples/c/lexcalc/lexcalc.test
+++ b/examples/c/lexcalc/lexcalc.test
@@ -46,15 +46,15 @@ EOF
 run 1 'err: 1.1-11: error: division by zero'
 
 
-# Multistart: parse "line" instead of "input".
+# Multistart: parse "expression" instead of "input".
 cat >input <<EOF
 1+2*3
 EOF
-run 0 7 -l
+run 0 'expression: 7' -e
 
 cat >input <<EOF
 1
 2
 EOF
-run 1 '1
-err: 2.1: syntax error, unexpected number, expecting end of file' -l
+run 1 'expression: failure
+err: 2.1: syntax error, unexpected number, expecting end of file' -e
diff --git a/examples/c/lexcalc/parse.y b/examples/c/lexcalc/parse.y
index dfbf4349..b3aaf476 100644
--- a/examples/c/lexcalc/parse.y
+++ b/examples/c/lexcalc/parse.y
@@ -76,10 +76,10 @@
 ;
 
 %token <int> NUM "number"
-%type <int> exp
+%type <int> exp expression line
 %printer { fprintf (yyo, "%d", $$); } <int>
 
-%start input line
+%start input expression
 
 // Precedence (from lowest to highest) and associativity.
 %left "+" "-"
@@ -93,8 +93,12 @@ input:
 ;
 
 line:
-  exp EOL   { printf ("%d\n", $exp); }
-| error EOL { yyerrok; }
+  exp EOL   { $$ = $exp; printf ("%d\n", $$); }
+| error EOL { $$ = 0; yyerrok; }
+;
+
+expression:
+  exp EOL  { $$ = $exp; }
 ;
 
 exp:
@@ -129,16 +133,22 @@ int main (int argc, const char *argv[])
   int nerrs = 0;
   // Possibly enable parser runtime debugging.
   yydebug = !!getenv ("YYDEBUG");
+  int parse_expression_p = 0;
   // Enable parse traces on option -p.
-  int parse_line_p = 0;
   for (int i = 0; i < argc; ++i)
     if (1 < argc && strcmp (argv[1], "-p") == 0)
       yydebug = 1;
-    else if (strcmp (argv[i], "-l") == 0)
-      parse_line_p = 1;
-
-  if (parse_line_p)
-    yyparse_line (&nerrs);
+    else if (strcmp (argv[i], "-e") == 0)
+      parse_expression_p = 1;
+
+  if (parse_expression_p)
+    {
+      yyparse_expression_t res = yyparse_expression (&nerrs);
+      if (res.yystatus == 0)
+        printf ("expression: %d\n", res.yyvalue);
+      else
+        printf ("expression: failure\n");
+    }
   else
     yyparse_input (&nerrs);
   // Exit on failure if there were errors.
diff --git a/src/main.c b/src/main.c
index 946e32d7..56ca2848 100644
--- a/src/main.c
+++ b/src/main.c
@@ -90,6 +90,7 @@ main (int argc, char *argv[])
   uniqstrs_new ();
   muscle_init ();
   complain_init ();
+  code_scanner_init ();
 
   getargs (argc, argv);
 
diff --git a/src/reader.c b/src/reader.c
index e17b96f8..1daccc22 100644
--- a/src/reader.c
+++ b/src/reader.c
@@ -267,6 +267,8 @@ static void
 grammar_rule_check_and_complete (symbol_list *r)
 {
   const symbol *lhs = r->content.sym;
+  const symbol *first_rhs = r->next->content.sym;
+
   /* Type check.
 
      If there is an action, then there is nothing we can do: the user
@@ -276,7 +278,6 @@ grammar_rule_check_and_complete (symbol_list *r)
      value can't be used.  */
   if (!r->action_props.code && lhs->content->type_name)
     {
-      symbol *first_rhs = r->next->content.sym;
       /* If $$ is being set in default way, report if any type mismatch.  */
       if (first_rhs)
         {
@@ -312,6 +313,30 @@ grammar_rule_check_and_complete (symbol_list *r)
                   _("empty rule for typed nonterminal, and no action"));
     }
 
+  /* For each start symbol, build the action of its start rule.  Use
+     the same obstack as the one used by scan-code, which is in charge
+     of actions. */
+  const bool multistart = start_symbols && start_symbols->next;
+  if (multistart && lhs == acceptsymbol)
+    {
+      const symbol *start = r->next->next->content.sym;
+      if (start->content->type_name)
+        obstack_printf (obstack_for_actions,
+                        "{ ]b4_accept([orig %d])[; }",
+                        start->content->number);
+      else
+        obstack_printf (obstack_for_actions,
+                        "{ ]b4_accept[; }");
+      code_props_rule_action_init
+        (&r->action_props,
+         obstack_finish0 (obstack_for_actions),
+         r->rhs_loc, r,
+         /* name */ NULL,
+         /* type */ NULL,
+         /* is_predicate */ false);
+    }
+
+
   /* Check that symbol values that should be used are in fact used.
      Don't check the generated start rules.  It has no action, so some
      rhs symbols may appear unused, but the parsing algorithm ensures
@@ -772,6 +797,11 @@ create_start_rule (symbol *swtok, symbol *start)
   symbol_list *p = initial_rule;
   if (swtok)
     {
+      // Cannot create the action now, as the symbols have not yet
+      // been assigned their number (by symbol_pack), which we need to
+      // know the type name.  So the action is created in
+      // grammar_rule_check_and_complete, which is run after
+      // symbol_pack.
       p->next = symbol_list_sym_new (swtok, empty_loc);
       p = p->next;
     }
diff --git a/src/scan-code.h b/src/scan-code.h
index 3859e2b2..b564523d 100644
--- a/src/scan-code.h
+++ b/src/scan-code.h
@@ -34,6 +34,11 @@ struct symbol_list;
  */
 extern int max_left_semantic_context;
 
+/**
+ * The obstack used to store the translated actions.
+ */
+extern struct obstack *obstack_for_actions;
+
 /**
  * A code passage captured from the grammar file and possibly translated,
  * and/or properties associated with such a code passage.  Don't break
@@ -191,6 +196,8 @@ void code_props_translate_code (code_props *self);
  */
 void code_scanner_last_string_free (void);
 
+void code_scanner_init (void);
+
 /**
  * \pre
  *   - None.
diff --git a/src/scan-code.l b/src/scan-code.l
index 8e0358da..23436a1f 100644
--- a/src/scan-code.l
+++ b/src/scan-code.l
@@ -40,6 +40,8 @@
 #undef code_wrap
 #define code_wrap() 1
 
+struct obstack *obstack_for_actions = &obstack_for_string;
+
 /* The current calling start condition: SC_RULE_ACTION or
    SC_SYMBOL_ACTION. */
 # define YY_DECL static char *code_lex (code_props *self, int sc_context)
@@ -756,19 +758,10 @@ handle_action_at (symbol_list *rule, char *text, const 
location *at_loc)
 static char const *
 translate_action (code_props *self, int sc_context)
 {
-  static bool initialized = false;
-  if (!initialized)
-    {
-      obstack_init (&obstack_for_string);
-      yy_flex_debug = 0;
-      initialized = true;
-    }
-
   loc->start = loc->end = self->location.start;
   yy_switch_to_buffer (yy_scan_string (self->code));
   char *res = code_lex (self, sc_context);
   yy_delete_buffer (YY_CURRENT_BUFFER);
-
   return res;
 }
 
@@ -845,6 +838,13 @@ code_scanner_last_string_free (void)
   STRING_FREE ();
 }
 
+void
+code_scanner_init (void)
+{
+  obstack_init (&obstack_for_string);
+  yy_flex_debug = 0;
+}
+
 void
 code_scanner_free (void)
 {
-- 
2.28.0

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 04/17] regen, (continued)
- [PATCH 04/17] regen, Akim Demaille, 2020/09/20
- [PATCH 05/17] multistart: turn start symbols into rules on $accept, Akim Demaille, 2020/09/20
- [PATCH 06/17] regen, Akim Demaille, 2020/09/20
- [PATCH 07/17] multistart: adjust computation of initial core and adjust reports, Akim Demaille, 2020/09/20
- [PATCH 08/17] multistart: also check the HTML report, Akim Demaille, 2020/09/20
- [PATCH 09/17] multistart: pass the list of start symbols to the backend, Akim Demaille, 2020/09/20
- [PATCH 10/17] multistart: equip yacc.c, Akim Demaille, 2020/09/20
- [PATCH 11/17] multistart: toy with it in lexcalc, Akim Demaille, 2020/09/20
- [PATCH 12/17] todo: more, Akim Demaille, 2020/09/20
- [PATCH 13/17] multistart: adjust reader checks for generated rules, Akim Demaille, 2020/09/20
- [PATCH 14/17] multistart: use b4_accept instead of action post-processing, Akim Demaille <=
- [PATCH 15/17] multistart: allow tokens as start symbols, Akim Demaille, 2020/09/20
- [PATCH 16/17] yacc.c: also count calls to YYERROR in yynerrs, Akim Demaille, 2020/09/20
- [PATCH 17/17] multistart: also give access to yynerrs, Akim Demaille, 2020/09/20
- Re: [PATCH 00/17] RFC: multiple start symbols, Paul Eggert, 2020/09/20
  - Re: [PATCH 00/17] RFC: multiple start symbols, Akim Demaille, 2020/09/23
    - Re: [PATCH 00/17] RFC: multiple start symbols, Adrian Vogelsgesang, 2020/09/23
    - Re: [PATCH 00/17] RFC: multiple start symbols, Akim Demaille, 2020/09/27
    - Re: [PATCH 00/17] RFC: multiple start symbols, Rici Lake, 2020/09/27
    - Re: multistart: returning structs, Akim Demaille, 2020/09/29
    - Re: multistart: yynerrs, Akim Demaille, 2020/09/29

Prev by Date: [PATCH 13/17] multistart: adjust reader checks for generated rules
Next by Date: [PATCH 15/17] multistart: allow tokens as start symbols
Previous by thread: [PATCH 13/17] multistart: adjust reader checks for generated rules
Next by thread: [PATCH 15/17] multistart: allow tokens as start symbols
Index(es):
- Date
- Thread