bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: double-reads with push parser


From: Akim Demaille
Subject: Re: double-reads with push parser
Date: Sun, 7 Mar 2021 11:38:04 +0100

Hu Ryan,

> Le 7 mars 2021 à 01:33, Ryan <dev@splintermail.com> a écrit :
> 
> I've noticed a strange behavior with my push parser on bison 3.7+, where
> when I multiple identical lines through it, the first one parses
> correctly, but all future lines parse the first token of the line twice.
> 
> I'm not sure if this is a bug in bison or if there's something wrong
> with my push loop, but it definitely worked as I expect it to as of
> 3.6.4.
> 
> Code to reproduce and the output of that code is below.

Thanks a lot for the analysis.  I was able to spot when the regression was 
introduced, which is, unsurprisingly 
(https://git.savannah.gnu.org/cgit/bison.git/commit/?id=330552ea499ca474f65967160e9d4e50265f9631):

commit 330552ea499ca474f65967160e9d4e50265f9631
Author: Akim Demaille <akim.demaille@gmail.com>
Date:   Sun Jun 28 16:09:13 2020 +0200

    yacc.c: push: don't clear the parser state when accepting/rejecting
    
    Currently when a push parser finishes its parsing (i.e., it did not
    return YYPUSH_MORE), it also clears its state.  It is therefore
    impossible to see if it had parse errors.
    
    In the context of autocompletion, because error recovery might have
    fired, the parser is actually already in a different state.  For
    instance on `(1 + + <TAB>` in the bistromathic, because there's a
    `exp: "(" error ")"` recovery rule, `1 + +` tokens have already been
    popped, replaced by `error`, and autocompletions think we are ready
    for the closing ")".  So here, we would like to see if there was a
    syntax error, yet `yynerrs` was cleared.
    
    In the case of a successful parse, we still have a problem: if error
    recovery succeeded, we won't know it, since, again, `yynerrs` is
    clearer.
    
    It seems much more natural to leave the parser state available for
    analysis when there is a failure.
    
    To reuse the parser, we should either:
    
    1. provide an explicit means to reinitialize a parser state for future
       parses.
    
    2. automatically reset the parser state when it is used in a new
       parse.
    
    Option 2 requires to check whether we need to reinitialize the parser
    each time we call `yypush_parse`, i.e., each time we give a new token.
    This seems expensive compared to Option 1, but benchmarks revealed no
    difference.  Option 1 is incompatible with the documentation
    ("After `yypush_parse` returns a status other than `YYPUSH_MORE`, the
    parser instance `yyps` may be reused for a new parse.").
    
    So Option 2 wins, reusing the private `yynew` member to record that a
    parse was finished, and therefore that the state must reset in the
    next call to `yypull_parse`.
    
    While at it, this implementation now reuses the previously enlarged
    stacks from one parse to another.
    
    * data/skeletons/yacc.c (yypstate_new): Set up the stacks in their
    initial configurations (setting their bottom to the stack array), and
    use yypstate_clear to reset them (moving their top to their bottom).
    (yypstate_delete): Adjust.
    (yypush_parse): At the beginning, clear yypstate if needed, and at the
    end, record when yypstate needs to be clearer.
    
    * examples/c/bistromathic/parse.y (expected_tokens): Do not propose
    autocompletion when there are parse errors.
    * examples/c/bistromathic/bistromathic.test: Check that case.



Reading this commit again, I don't understand why I had changed the handling of 
yynew in yypush_parse this way.  The following patch does it better, IMHO.

I will push this into the maint(enance) branch, and will very soon release 
3.7.6 with this fix.  Could you please confirm that the Bison in this tarball 
does address your issue?  It is not 3.7.6, it is the current development branch 
(to become 3.8), but it should still be relevant to check your issue.

https://www.lrde.epita.fr/~akim/private/bison/bison-3.7.5.320-7fb5c.tar.gz
https://www.lrde.epita.fr/~akim/private/bison/bison-3.7.5.320-7fb5c.tar.lz
https://www.lrde.epita.fr/~akim/private/bison/bison-3.7.5.320-7fb5c.tar.xz

Thanks a lot!

        Akim


commit 7fb5cf6790fb0e58917892867742cc29c4263384
Author: Akim Demaille <akim.demaille@gmail.com>
Date:   Sun Mar 7 10:01:53 2021 +0100

    yacc: fix push parser
    
    When a pstate is used for multiple successive parses, some state may
    leak from a previous run into the following one.  That was introduced
    in 330552ea499ca474f65967160e9d4e50265f9631 "yacc.c: push: don't clear
    the parser state when accepting/rejecting".
    
    Reported by Ryan <dev@splintermail.com>
    https://lists.gnu.org/r/bug-bison/2021-03/msg00000.html
    
    * data/skeletons/yacc.c (yypush_parse): We reusing a pstate from a
    previous run, do behave as if it were the first run.
    * tests/push.at (Pstate reuse): Check this.

diff --git a/data/skeletons/yacc.c b/data/skeletons/yacc.c
index a59e6aef3..98671322d 100644
--- a/data/skeletons/yacc.c
+++ b/data/skeletons/yacc.c
@@ -1650,14 +1650,16 @@ yyparse (]m4_ifset([b4_parse_param], 
[b4_formals(b4_parse_param)], [void])[)]])]
 
   switch (yyps->yynew)
     {
-    case 2:
-      yypstate_clear (yyps);
-      goto case_0;
-
-    case_0:
     case 0:
       yyn = yypact[yystate];
       goto yyread_pushed_token;
+
+    case 2:
+      yypstate_clear (yyps);
+      break;
+
+    default:
+      break;
     }]])[
 
   YYDPRINTF ((stderr, "Starting parse\n"));
diff --git a/tests/push.at b/tests/push.at
index cd3e113ee..f4a259bcd 100644
--- a/tests/push.at
+++ b/tests/push.at
@@ -25,7 +25,7 @@ AT_BANNER([[Push Parsing Tests]])
 AT_SETUP([[Memory Leak for Early Deletion]])
 
 # Requires Valgrind.
-AT_BISON_OPTION_PUSHDEFS
+AT_BISON_OPTION_PUSHDEFS([%define api.push-pull push])
 AT_DATA_GRAMMAR([[input.y]],
 [[
 %{
@@ -144,8 +144,8 @@ AT_CLEANUP
 
 AT_SETUP([[Unsupported Skeletons]])
 
-AT_BISON_OPTION_PUSHDEFS
-AT_DATA([[input.y]],
+AT_BISON_OPTION_PUSHDEFS([%define api.push-pull push])
+AT_DATA_GRAMMAR([[input.y]],
 [[%glr-parser
 %define api.push-pull push
 %%
@@ -158,3 +158,125 @@ AT_BISON_CHECK([[input.y]], [[1]], [],
 ]])
 
 AT_CLEANUP
+
+
+## -------------- ##
+## Pstate reuse.  ##
+## -------------- ##
+
+AT_SETUP([[Pstate reuse]])
+
+# Make sure that when a single pstate is used for multiple successive
+# parses, no state from a previous run leaks into the following one.
+#
+# See https://lists.gnu.org/r/bug-bison/2021-03/msg00000.html.
+
+AT_BISON_OPTION_PUSHDEFS([%define api.push-pull push])
+AT_DATA_GRAMMAR([[input.y]],
+[[%code top {
+  #include <stdlib.h>
+  #include <string.h>
+
+  static char *string_concat (char *a, char *b);
+  ]AT_YYERROR_DECLARE[
+}
+
+%define parse.trace
+%define api.pure full
+%define api.push-pull push
+%expect 0
+
+%union {
+  char *sval;
+};
+%destructor { free ($$); } <sval>
+%printer { fprintf (yyo, "%s", $$); } <sval>
+
+%token <sval> RAW
+%token EOL
+
+%type <sval> text
+
+%%
+
+line
+  : text EOL  { printf ("text: %s\n", $1); free ($1); YYACCEPT; };
+
+text
+  : RAW       { $$ = $1; }
+  | text RAW  { $$ = string_concat ($1, $2); }
+  ;
+
+%%
+]AT_YYERROR_DEFINE[
+
+static char *
+string_concat (char *a, char *b)
+{
+  size_t la = strlen (a);
+  size_t lb = strlen (b);
+  char *res = malloc (la + lb + 1);
+  strcpy (res, a);
+  strcpy (res + la, b);
+  free (a);
+  free (b);
+  return res;
+}
+
+static int
+push (void *yyps, yytoken_kind_t kind, const char *str)
+{
+  YYSTYPE lval;
+  lval.sval = str ? strdup (str) : NULL;
+  switch (yypush_parse (yyps, kind, &lval))
+    {
+    case 0:
+      return 0;
+    case YYPUSH_MORE:
+      // parsing incomplete, but valid; parser not reset
+      return 0;
+    case 1:
+      // YYABORT or syntax invalid; parser is reset
+      fprintf (stderr, "invalid input, but no error was thrown\n");
+      return 1;
+    case 2:
+      // memory exhaustion; parser is reset
+      fprintf (stderr, "memory exhaustion during yypush_parse\n");
+      return 1;
+    }
+  return 1;
+}
+
+int main(void)
+{
+  yydebug = !!getenv ("YYDEBUG");
+  void *yyps = yypstate_new ();
+
+#define PUSH(Kind, Val)                         \
+  do {                                          \
+    if (push (yyps, Kind, Val))                 \
+      return 1;                                 \
+  } while (0)
+
+  PUSH (RAW, "te");
+  PUSH (RAW, "xt");
+  PUSH (EOL, NULL);
+
+  PUSH (RAW, "te");
+  PUSH (RAW, "xt");
+  PUSH (EOL, NULL);
+
+  yypstate_delete (yyps);
+
+  return 0;
+}
+]])
+
+AT_FULL_COMPILE([input])
+AT_CHECK([./input], 0,
+[[text: text
+text: text
+]])
+
+AT_BISON_OPTION_POPDEFS
+AT_CLEANUP




reply via email to

[Prev in Thread] Current Thread [Next in Thread]