LAC (lookahead correction) for syntax error handling

bison-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
LAC (lookahead correction) for syntax error handling

From:	Joel E. Denny
Subject:	LAC (lookahead correction) for syntax error handling
Date:	Thu, 23 Dec 2010 23:51:59 -0500 (EST)
User-agent:	Alpine 2.00 (DEB 1167 2008-08-23)
LAC is the last major new feature I had planned for Bison 2.5.  Without 
LAC, IELR is guaranteed to behave like canonical LR only for syntactically 
correct sentences.  With LAC, IELR is also guaranteed to behave like 
canonical LR for syntactically incorrect sentences.  (However, LAC also 
fixes problems for canonical LR in some cases.)  Perhaps most notably, LAC 
fixes the expected token list in syntax error messages produced by 
%error-verbose.  See NEWS for more details.

To implement and document LAC, I pushed the following patches to master 
and something similar to branch-2.5.

Next, the manual needs some work.

>From bf35c71c5827d735c125ee25b048eabf40960a55 Mon Sep 17 00:00:00 2001
From: Joel E. Denny <address@hidden>
Date: Sat, 11 Dec 2010 11:13:33 -0500
Subject: [PATCH 1/3] parse.lac: implement as %define variable.

LAC = lookahead correction.  See discussion at
<http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00034.html>.
However, one point there must be corrected: because of %nonassoc,
LAC is *not* always redundant for lr.type=canonical-lr.
* data/yacc.c: Accept values of "none" (default) or "full" for
parse.lac.  Accept %define parse.lac.es-capacity to specify
capacity of LAC's temporary exploratory stack.  It defaults to 20
and, for now, will not grow dynamically.
(b4_lac_flag, b4_lac_if): New m4 macros.  Evaluate as true for
parse.lac!=none.
(YYBACKUP): Invoke YY_LAC_DISCARD.
(YY_LAC_ESTABLISH, YY_LAC_DISCARD): New cpp macros that invoke
yy_lac and track when it needs to be invoked
(yy_lac): New function that, given the current stack, determines
whether a token can eventually be shifted.  Return status mimics
yyparse return status.
(yysyntax_error): Change yystate argument to yyssp so stack top
can be passed to yy_lac.  If LAC is requested, build expected
token list by invoking yy_lac for every token instead of just
checking the current state for lookaheads.  Return 2 if yy_lac
exhausts memory.
(yyparse, yypush_parse): Use local variable yy_lac_established and
cpp macros YY_LAC_ESTABLISH and YY_LAC_DISCARD to implement LAC.
Update yysyntax_error invocation.  Add yyexhaustedlab code if LAC
is requested.
* tests/conflicts.at (%nonassoc and eof): Extend to check the
effect of each of -Dlr.type=canonical-lr and -Dparse.lac=full.
(parse.error=verbose and consistent errors): Likewise.
(LAC: %nonassoc requires splitting canonical LR states): New test
group demonstrating how LAC can fix canonical LR.
* tests/input.at (LAC: Errors for %define): New test group.
* tests/regression.at (LAC: Exploratory stack): New test group.
(LAC: Memory exhaustion): New test group.
---
 ChangeLog           |   37 ++++
 data/yacc.c         |  277 ++++++++++++++++++++++++++---
 src/parse-gram.c    |  491 ++++++++++++++++++++++++++-------------------------
 src/parse-gram.h    |   12 +-
 tests/conflicts.at  |  185 +++++++++++++++++---
 tests/input.at      |   19 ++
 tests/regression.at |  183 +++++++++++++++++++
 7 files changed, 901 insertions(+), 303 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 154fd35..63d41ed 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,40 @@
+2010-12-11  Joel E. Denny  <address@hidden>
+
+       parse.lac: implement as %define variable.
+       LAC = lookahead correction.  See discussion at
+       <http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00034.html>.
+       However, one point there must be corrected: because of %nonassoc,
+       LAC is *not* always redundant for lr.type=canonical-lr.
+       * data/yacc.c: Accept values of "none" (default) or "full" for
+       parse.lac.  Accept %define parse.lac.es-capacity to specify
+       capacity of LAC's temporary exploratory stack.  It defaults to 20
+       and, for now, will not grow dynamically.
+       (b4_lac_flag, b4_lac_if): New m4 macros.  Evaluate as true for
+       parse.lac!=none.
+       (YYBACKUP): Invoke YY_LAC_DISCARD.
+       (YY_LAC_ESTABLISH, YY_LAC_DISCARD): New cpp macros that invoke
+       yy_lac and track when it needs to be invoked
+       (yy_lac): New function that, given the current stack, determines
+       whether a token can eventually be shifted.  Return status mimics
+       yyparse return status.
+       (yysyntax_error): Change yystate argument to yyssp so stack top
+       can be passed to yy_lac.  If LAC is requested, build expected
+       token list by invoking yy_lac for every token instead of just
+       checking the current state for lookaheads.  Return 2 if yy_lac
+       exhausts memory.
+       (yyparse, yypush_parse): Use local variable yy_lac_established and
+       cpp macros YY_LAC_ESTABLISH and YY_LAC_DISCARD to implement LAC.
+       Update yysyntax_error invocation.  Add yyexhaustedlab code if LAC
+       is requested.
+       * tests/conflicts.at (%nonassoc and eof): Extend to check the
+       effect of each of -Dlr.type=canonical-lr and -Dparse.lac=full.
+       (parse.error=verbose and consistent errors): Likewise.
+       (LAC: %nonassoc requires splitting canonical LR states): New test
+       group demonstrating how LAC can fix canonical LR.
+       * tests/input.at (LAC: Errors for %define): New test group.
+       * tests/regression.at (LAC: Exploratory stack): New test group.
+       (LAC: Memory exhaustion): New test group.
+
 2010-11-21  Joel E. Denny  <address@hidden>
 
        build: use gnulib's new bootstrap_sync option.
diff --git a/data/yacc.c b/data/yacc.c
index 5cf15ee..5ba564e 100644
--- a/data/yacc.c
+++ b/data/yacc.c
@@ -38,6 +38,16 @@ b4_use_push_for_pull_if([
   b4_push_if([m4_define([b4_use_push_for_pull_flag], [[0]])],
              [m4_define([b4_push_flag], [[1]])])])
 
+# Check the value of %define parse.lac, where LAC stands for lookahead
+# correction.
+b4_percent_define_default([[parse.lac]], [[none]])
+b4_percent_define_default([[parse.lac.es-capacity]], [[20]])
+b4_percent_define_check_values([[[[parse.lac]], [[full]], [[none]]]])
+b4_define_flag_if([lac])
+m4_define([b4_lac_flag],
+          [m4_if(b4_percent_define_get([[parse.lac]]),
+                 [none], [[0]], [[1]])])
+
 m4_include(b4_pkgdatadir/[c.m4])
 
 ## ---------------- ##
@@ -629,7 +639,8 @@ do                                                          
\
     {                                                          \
       yychar = (Token);                                                \
       yylval = (Value);                                                \
-      YYPOPSTACK (1);                                          \
+      YYPOPSTACK (1);                                          \]b4_lac_if([[
+      YY_LAC_DISCARD ("YYBACKUP");                             \]])[
       goto yybackup;                                           \
     }                                                          \
   else                                                         \
@@ -813,9 +824,173 @@ int yydebug;
 
 #ifndef YYMAXDEPTH
 # define YYMAXDEPTH ]b4_stack_depth_max[
+#endif]b4_lac_if([[
+
+/* Establish the initial context for the current lookahead if no initial
+   context is currently established.
+
+   We define a context as a snapshot of the parser stacks.  We define
+   the initial context for a lookahead as the context in which the
+   parser initially examines that lookahead in order to select a
+   syntactic action.  Thus, if the lookahead eventually proves
+   syntactically unacceptable (possibly in a later context reached via a
+   series of reductions), the initial context can be used to determine
+   the exact set of tokens that would be syntactically acceptable in the
+   lookahead's place.  Moreover, it is the context after which any
+   further semantic actions would be erroneous because they would be
+   determined by a syntactically unacceptable token.
+
+   YY_LAC_ESTABLISH should be invoked when a reduction is about to be
+   performed in an inconsistent state (which, for the purposes of LAC,
+   includes consistent states that don't know they're consistent because
+   their default reductions have been disabled).  Iff there is a
+   lookahead token, it should also be invoked before reporting a syntax
+   error.  This latter case is for the sake of the debugging output.
+
+   For parse.lac=full, the implementation of YY_LAC_ESTABLISH is as
+   follows.  If no initial context is currently established for the
+   current lookahead, then check if that lookahead can eventually be
+   shifted if syntactic actions continue from the current context.
+   Report a syntax error if it cannot.  */
+#define YY_LAC_ESTABLISH                                       \
+do {                                                           \
+  if (!yy_lac_established)                                     \
+    {                                                          \
+      YYDPRINTF ((stderr,                                      \
+                  "LAC: initial context established for %s\n", \
+                  yytname[yytoken]));                          \
+      yy_lac_established = 1;                                  \
+      {                                                        \
+        int yy_lac_status =                                    \
+          yy_lac (yyssp, yytoken);                             \
+        if (yy_lac_status == 2)                                \
+          goto yyexhaustedlab;                                 \
+        if (yy_lac_status == 1)                                \
+          goto yyerrlab;                                       \
+      }                                                        \
+    }                                                          \
+} while (YYID (0))
+
+/* Discard any previous initial lookahead context because of Event,
+   which may be a lookahead change or an invalidation of the currently
+   established initial context for the current lookahead.
+
+   The most common example of a lookahead change is a shift.  An example
+   of both cases is syntax error recovery.  That is, a syntax error
+   occurs when the lookahead is syntactically erroneous for the
+   currently established initial context, so error recovery manipulates
+   the parser stacks to try to find a new initial context in which the
+   current lookahead is syntactically acceptable.  If it fails to find
+   such a context, it discards the lookahead.  */
+#if YYDEBUG
+# define YY_LAC_DISCARD(Event)                                           \
+do {                                                                     \
+  if (yy_lac_established)                                                \
+    {                                                                    \
+      if (yydebug)                                                       \
+        YYFPRINTF (stderr, "LAC: initial context discarded due to "      \
+                   Event "\n");                                          \
+      yy_lac_established = 0;                                            \
+    }                                                                    \
+} while (YYID (0))
+#else
+# define YY_LAC_DISCARD(Event) yy_lac_established = 0
 #endif
 
-
+/* Given the stack whose top is *YYSSP, return 0 iff YYTOKEN can
+   eventually (after perhaps some reductions) be shifted, and return 1
+   if not.  Return 2 if memory is exhausted.  */
+static int
+yy_lac (yytype_int16 *yyssp, int yytoken)
+{
+  yytype_int16 *yyes_prev = yyssp;
+  yytype_int16 address@hidden([[parse.lac.es-capacity]])address@hidden;
+  yytype_int16 *yyesp = yyes_prev;
+  YYDPRINTF ((stderr, "LAC: checking lookahead %s:", yytname[yytoken]));
+  if (yytoken == YYUNDEFTOK)
+    {
+      YYDPRINTF ((stderr, " Always Err\n"));
+      return 1;
+    }
+  while (1)
+    {
+      int yyrule = yypact[*yyesp];
+      if (yypact_value_is_default (yyrule)
+          || (yyrule += yytoken) < 0 || YYLAST < yyrule
+          || yycheck[yyrule] != yytoken)
+        {
+          yyrule = yydefact[*yyesp];
+          if (yyrule == 0)
+            {
+              YYDPRINTF ((stderr, " Err\n"));
+              return 1;
+            }
+        }
+      else
+        {
+          yyrule = yytable[yyrule];
+          if (yytable_value_is_error (yyrule))
+            {
+              YYDPRINTF ((stderr, " Err\n"));
+              return 1;
+            }
+          if (0 < yyrule)
+            {
+              YYDPRINTF ((stderr, " S%d\n", yyrule));
+              return 0;
+            }
+          yyrule = -yyrule;
+        }
+      {
+        YYSIZE_T yylen = yyr2[yyrule];
+        YYDPRINTF ((stderr, " R%d", yyrule - 1));
+        if (yyesp != yyes_prev)
+          {
+            YYSIZE_T yysize = yyesp - yyes + 1;
+            if (yylen < yysize)
+              {
+                yyesp -= yylen;
+                yylen = 0;
+              }
+            else
+              {
+                yylen -= yysize;
+                yyesp = yyes_prev;
+              }
+          }
+        if (yylen)
+          yyesp = yyes_prev -= yylen;
+      }
+      {
+        int yystate;
+        {
+          int yylhs = yyr1[yyrule] - YYNTOKENS;
+          yystate = yypgoto[yylhs] + *yyesp;
+          if (yystate < 0 || YYLAST < yystate
+              || yycheck[yystate] != *yyesp)
+            yystate = yydefgoto[yylhs];
+          else
+            yystate = yytable[yystate];
+        }
+        if (yyesp == yyes_prev)
+          {
+            yyesp = yyes;
+            *yyesp = yystate;
+          }
+        else
+          {
+            if (yyesp == yyes + (sizeof yyes / sizeof *yyes) - 1)
+              {
+                YYDPRINTF ((stderr, " (max stack size exceeded)\n"));
+                return 2;
+              }
+            *++yyesp = yystate;
+          }
+        YYDPRINTF ((stderr, " G%d", *yyesp));
+      }
+    }
+}]])[
+
 
 #if YYERROR_VERBOSE
 
@@ -904,15 +1079,18 @@ yytnamerr (char *yyres, const char *yystr)
 # endif
 
 /* Copy into *YYMSG, which is of size *YYMSG_ALLOC, an error message
-   about the unexpected token YYTOKEN while in state YYSTATE.
+   about the unexpected token YYTOKEN for the state stack whose top is
+   YYSSP.]b4_lac_if([[  In order to see if a particular token T is a
+   valid looakhead, invoke yy_lac (YYSSP, T).]])[
 
    Return 0 if *YYMSG was successfully written.  Return 1 if *YYMSG is
    not large enough to hold the message.  In that case, also set
    *YYMSG_ALLOC to the required number of bytes.  Return 2 if the
-   required number of bytes is too large to store.  */
+   required number of bytes is too large to store]b4_lac_if([[ or if
+   yy_lac returned 2]])[.  */
 static int
 yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
-                int yystate, int yytoken)
+                yytype_int16 *yyssp, int yytoken)
 {
   YYSIZE_T yysize0 = yytnamerr (0, yytname[yytoken]);
   YYSIZE_T yysize = yysize0;
@@ -943,7 +1121,12 @@ yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
      - Don't assume there isn't a lookahead just because this state is a
        consistent state with a default action.  There might have been a
        previous inconsistent state, consistent state with a non-default
-       action, or user semantic action that manipulated yychar.
+       action, or user semantic action that manipulated yychar.]b4_lac_if([[
+       In the first two cases, it might appear that the current syntax
+       error should have been detected in the previous state when yy_lac
+       was invoked.  However, at that time, there might have been a
+       different syntax error that discarded a different initial context
+       during error recovery, leaving behind the current lookahead.]], [[
      - Of course, the expected token list depends on states to have
        correct lookahead information, and it depends on the parser not
        to perform extra reductions after fetching a lookahead from the
@@ -951,26 +1134,39 @@ yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
        (from LALR or IELR) and default reductions corrupt the expected
        token list.  However, the list is correct for canonical LR with
        one exception: it will still contain any token that will not be
-       accepted due to an error action in a later state.
+       accepted due to an error action in a later state.]])[
   */
   if (yytoken != YYEMPTY)
     {
-      int yyn = yypact[yystate];
+      int yyn = yypact[*yyssp];]b4_lac_if([[
+      YYDPRINTF ((stderr, "Constructing syntax error message\n"));]])[
       yyarg[yycount++] = yytname[yytoken];
       if (!yypact_value_is_default (yyn))
-        {
+        {]b4_lac_if([], [[
           /* Start YYX at -YYN if negative to avoid negative indexes in
              YYCHECK.  In other words, skip the first -YYN actions for
              this state because they are default actions.  */
           int yyxbegin = yyn < 0 ? -yyn : 0;
           /* Stay within bounds of both yycheck and yytname.  */
           int yychecklim = YYLAST - yyn + 1;
-          int yyxend = yychecklim < YYNTOKENS ? yychecklim : YYNTOKENS;
-          int yyx;
+          int yyxend = yychecklim < YYNTOKENS ? yychecklim : YYNTOKENS;]])[
+          int yyx;]b4_lac_if([[
+
+          for (yyx = 0; yyx < YYNTOKENS; ++yyx)
+            if (yyx != YYTERROR && yyx != YYUNDEFTOK)
+              {
+                {
+                  int yy_lac_status = yy_lac (yyssp, yyx);
+                  if (yy_lac_status == 2)
+                    return 2;
+                  if (yy_lac_status == 1)
+                    continue;
+                }]], [[
+
           for (yyx = yyxbegin; yyx < yyxend; ++yyx)
             if (yycheck[yyx + yyn] == yyx && yyx != YYTERROR
                 && !yytable_value_is_error (yytable[yyx + yyn]))
-              {
+              {]])[
                 if (yycount == YYERROR_VERBOSE_ARGS_MAXIMUM)
                   {
                     yycount = 1;
@@ -984,12 +1180,16 @@ yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
                   return 2;
                 yysize = yysize1;
               }
-        }
+        }]b4_lac_if([[
+# if YYDEBUG
+      else if (yydebug)
+        YYFPRINTF (stderr, "No expected tokens.\n");
+# endif]])[
     }
 
   switch (yycount)
     {
-#define YYCASE_(N, S)                       \
+# define YYCASE_(N, S)                      \
       case N:                               \
         yyformat = S;                       \
       break
@@ -999,7 +1199,7 @@ yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
       YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s"));
       YYCASE_(4, YY_("syntax error, unexpected %s, expecting %s or %s or %s"));
       YYCASE_(5, YY_("syntax error, unexpected %s, expecting %s or %s or %s or 
%s"));
-#undef YYCASE_
+# undef YYCASE_
     }
 
   yysize1 = yysize + yystrlen (yyformat);
@@ -1037,7 +1237,6 @@ yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
   return 0;
 }
 #endif /* YYERROR_VERBOSE */
-
 
 ]b4_yydestruct_generate([b4_c_function_def])b4_push_if([], [[
 
@@ -1172,7 +1371,8 @@ b4_c_function_def([[yyparse]], [[int]], b4_parse_param)[
   YYLTYPE yypushed_loc = yylloc;]])
 ])],
   [b4_declare_parser_state_variables
-])[
+])b4_lac_if([[
+  int yy_lac_established = 0;]])[
   int yyn;
   int yyresult;
   /* Lookahead token as an internal (translated) token number.  */
@@ -1379,13 +1579,18 @@ yyread_pushed_token:]])[
   /* If the proper action on seeing token YYTOKEN is to reduce or to
      detect an error, take that action.  */
   yyn += yytoken;
-  if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
-    goto yydefault;
+  if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)]b4_lac_if([[
+    {
+      YY_LAC_ESTABLISH;
+      goto yydefault;
+    }]], [[
+    goto yydefault;]])[
   yyn = yytable[yyn];
   if (yyn <= 0)
     {
       if (yytable_value_is_error (yyn))
-       goto yyerrlab;
+        goto yyerrlab;]b4_lac_if([[
+      YY_LAC_ESTABLISH;]])[
       yyn = -yyn;
       goto yyreduce;
     }
@@ -1399,7 +1604,8 @@ yyread_pushed_token:]])[
   YY_SYMBOL_PRINT ("Shifting", yytoken, &yylval, &yylloc);
 
   /* Discard the shifted token.  */
-  yychar = YYEMPTY;
+  yychar = YYEMPTY;]b4_lac_if([[
+  YY_LAC_DISCARD ("shift");]])[
 
   yystate = yyn;
   *++yyvsp = yylval;
@@ -1437,12 +1643,22 @@ yyreduce:
 ]b4_locations_if(
 [[  /* Default location.  */
   YYLLOC_DEFAULT (yyloc, (yylsp - yylen), yylen);]])[
-  YY_REDUCE_PRINT (yyn);
+  YY_REDUCE_PRINT (yyn);]b4_lac_if([[
+  {
+    int yychar_backup = yychar;
+    switch (yyn)
+      {
+        ]b4_user_actions[
+        default: break;
+      }
+    if (yychar_backup != yychar)
+      YY_LAC_DISCARD ("yychar change");
+  }]], [[
   switch (yyn)
     {
       ]b4_user_actions[
       default: break;
-    }
+    }]])[
   /* User semantic actions sometimes alter yychar, and that requires
      that yytoken be updated with the new translation.  We take the
      approach of translating immediately before every use of yytoken.
@@ -1493,11 +1709,14 @@ yyerrlab:
 #if ! YYERROR_VERBOSE
       yyerror (]b4_yyerror_args[YY_("syntax error"));
 #else
-# define YYSYNTAX_ERROR yysyntax_error (&yymsg_alloc, &yymsg, yystate, \
+# define YYSYNTAX_ERROR yysyntax_error (&yymsg_alloc, &yymsg, yyssp, \
                                         yytoken)
       {
         char const *yymsgp = YY_("syntax error");
-        int yysyntax_error_status = YYSYNTAX_ERROR;
+        int yysyntax_error_status;]b4_lac_if([[
+        if (yychar != YYEMPTY)
+          YY_LAC_ESTABLISH;]])[
+        yysyntax_error_status = YYSYNTAX_ERROR;
         if (yysyntax_error_status == 0)
           yymsgp = yymsg;
         else if (yysyntax_error_status == 1)
@@ -1602,7 +1821,11 @@ yyerrlab1:
       YYPOPSTACK (1);
       yystate = *yyssp;
       YY_STACK_PRINT (yyss, yyssp);
-    }
+    }]b4_lac_if([[
+
+  /* If the stack popping above didn't lose the initial context for the
+     current lookahead token, the shift below will for sure.  */
+  YY_LAC_DISCARD ("error recovery");]])[
 
   *++yyvsp = yylval;
 ]b4_locations_if([[
@@ -1633,7 +1856,7 @@ yyabortlab:
   yyresult = 1;
   goto yyreturn;
 
-#if !defined(yyoverflow) || YYERROR_VERBOSE
+#if ]b4_lac_if([[1]], [[!defined(yyoverflow) || YYERROR_VERBOSE]])[
 /*-------------------------------------------------.
 | yyexhaustedlab -- memory exhaustion comes here.  |
 `-------------------------------------------------*/
diff --git a/tests/conflicts.at b/tests/conflicts.at
index 655a666..16e1956 100644
--- a/tests/conflicts.at
+++ b/tests/conflicts.at
@@ -94,46 +94,52 @@ main (int argc, const char *argv[])
 }
 ]])
 
-# Specify the output files to avoid problems on different file systems.
-AT_BISON_CHECK([-o input.c input.y])
+m4_pushdef([AT_NONASSOC_AND_EOF_CHECK],
+[AT_BISON_CHECK([$1[ -o input.c input.y]])
 AT_COMPILE([input])
 
+m4_pushdef([AT_EXPECTING], [m4_if($2, [correct], [[, expecting $end]])])
+
 AT_PARSER_CHECK([./input '0<0'])
 AT_PARSER_CHECK([./input '0<0<0'], [1], [],
-         [syntax error, unexpected '<'
+         [syntax error, unexpected '<'AT_EXPECTING
 ])
 
 AT_PARSER_CHECK([./input '0>0'])
 AT_PARSER_CHECK([./input '0>0>0'], [1], [],
-         [syntax error, unexpected '>'
+         [syntax error, unexpected '>'AT_EXPECTING
 ])
 
 AT_PARSER_CHECK([./input '0<0>0'], [1], [],
-         [syntax error, unexpected '>'
+         [syntax error, unexpected '>'AT_EXPECTING
 ])
 
-# We must disable default reductions in inconsistent states in order to
-# have an explicit list of all expected tokens.  (However, unless we use
-# canonical LR, lookahead sets are merged for different left contexts,
-# so it is still possible to have extra incorrect tokens in the expected
-# list.  That just doesn't happen to be a problem for this test case.)
-
-AT_BISON_CHECK([-Dlr.default-reductions=consistent -o input.c input.y])
-AT_COMPILE([input])
-
-AT_PARSER_CHECK([./input '0<0'])
-AT_PARSER_CHECK([./input '0<0<0'], [1], [],
-         [syntax error, unexpected '<', expecting $end
-])
+m4_popdef([AT_EXPECTING])])
 
-AT_PARSER_CHECK([./input '0>0'])
-AT_PARSER_CHECK([./input '0>0>0'], [1], [],
-         [syntax error, unexpected '>', expecting $end
-])
+# Expected token list is missing.
+AT_NONASSOC_AND_EOF_CHECK([], [[incorrect]])
 
-AT_PARSER_CHECK([./input '0<0>0'], [1], [],
-         [syntax error, unexpected '>', expecting $end
-])
+# We must disable default reductions in inconsistent states in order to
+# have an explicit list of all expected tokens.
+AT_NONASSOC_AND_EOF_CHECK([[-Dlr.default-reductions=consistent]],
+                          [[correct]])
+
+# lr.default-reductions=consistent happens to work for this test case.
+# However, for other grammars, lookahead sets can be merged for
+# different left contexts, so it is still possible to have an incorrect
+# expected list.  Canonical LR is almost a general solution (that is, it
+# can fail only when %nonassoc is used), so make sure it gives the same
+# result as above.
+AT_NONASSOC_AND_EOF_CHECK([[-Dlr.type=canonical-lr]], [[correct]])
+
+# parse.lac=full is a completely general solution that does not require
+# any of the above sacrifices.  Of course, it does not extend the
+# language-recognition power of LALR to (IE)LR, but it does ensure that
+# the reported list of expected tokens matches what the given parser
+# would have accepted in place of the unexpected token.
+AT_NONASSOC_AND_EOF_CHECK([[-Dparse.lac=full]], [[correct]])
+
+m4_popdef([AT_NONASSOC_AND_EOF_CHECK])
 
 AT_CLEANUP
 
@@ -342,6 +348,18 @@ AT_CONSISTENT_ERRORS_CHECK([[%define lr.type 
canonical-lr]],
                            [AT_PREVIOUS_STATE_INPUT],
                            [[$end]], [[ab]])
 
+# Only LAC gets it right.
+AT_CONSISTENT_ERRORS_CHECK([[%define lr.type canonical-lr
+                             %define parse.lac full]],
+                           [AT_PREVIOUS_STATE_GRAMMAR],
+                           [AT_PREVIOUS_STATE_INPUT],
+                           [[$end]], [[b]])
+AT_CONSISTENT_ERRORS_CHECK([[%define lr.type ielr
+                             %define parse.lac full]],
+                           [AT_PREVIOUS_STATE_GRAMMAR],
+                           [AT_PREVIOUS_STATE_INPUT],
+                           [[$end]], [[b]])
+
 m4_popdef([AT_PREVIOUS_STATE_GRAMMAR])
 m4_popdef([AT_PREVIOUS_STATE_INPUT])
 
@@ -417,6 +435,16 @@ AT_CONSISTENT_ERRORS_CHECK([[%define lr.type 
canonical-lr]],
                            [AT_USER_ACTION_INPUT],
                            [[$end]], [[a]])
 
+AT_CONSISTENT_ERRORS_CHECK([[%define parse.lac full]],
+                           [AT_USER_ACTION_GRAMMAR],
+                           [AT_USER_ACTION_INPUT],
+                           [['b']], [[none]])
+AT_CONSISTENT_ERRORS_CHECK([[%define parse.lac full
+                             %define lr.default-reductions accepting]],
+                           [AT_USER_ACTION_GRAMMAR],
+                           [AT_USER_ACTION_INPUT],
+                           [[$end]], [[none]])
+
 m4_popdef([AT_USER_ACTION_GRAMMAR])
 m4_popdef([AT_USER_ACTION_INPUT])
 
@@ -426,6 +454,113 @@ AT_CLEANUP
 
 
 
+## ------------------------------------------------------- ##
+## LAC: %nonassoc requires splitting canonical LR states.  ##
+## ------------------------------------------------------- ##
+
+# This test case demonstrates that, when %nonassoc is used, canonical
+# LR(1) parser table construction followed by conflict resolution
+# without further state splitting is not always sufficient to produce a
+# parser that can detect all syntax errors as soon as possible on one
+# token of lookahead.  However, LAC solves the problem completely even
+# with minimal LR parser tables.
+
+AT_SETUP([[LAC: %nonassoc requires splitting canonical LR states]])
+
+AT_DATA_GRAMMAR([[input.y]],
+[[%code {
+  #include <stdio.h>
+  void yyerror (char const *);
+  int yylex (void);
+}
+
+%error-verbose
+%nonassoc 'a'
+
+%%
+
+start:
+  'a' problem 'a' // First context.
+| 'b' problem 'b' // Second context.
+| 'c' reduce-nonassoc // Just makes reduce-nonassoc useful.
+;
+
+problem:
+  look reduce-nonassoc
+| look 'a'
+| look 'b'
+;
+
+// For the state reached after shifting the 'a' in these productions,
+// lookahead sets are the same in both the first and second contexts.
+// Thus, canonical LR reuses the same state for both contexts.  However,
+// the lookahead 'a' for the reduction "look: 'a'" later becomes an
+// error action only in the first context.  In order to immediately
+// detect the syntax error on 'a' here for only the first context, this
+// canonical LR state would have to be split into two states, and the
+// 'a' lookahead would have to be removed from only one of the states.
+look:
+  'a' // Reduction lookahead set is always ['a', 'b'].
+| 'a' 'b'
+| 'a' 'c' // 'c' is forgotten as an expected token.
+;
+
+reduce-nonassoc: %prec 'a';
+
+%%
+
+void
+yyerror (char const *msg)
+{
+  fprintf (stderr, "%s\n", msg);
+}
+
+int
+yylex (void)
+{
+  char const *input = "aaa";
+  return *input++;
+}
+
+int
+main (void)
+{
+  return yyparse ();
+}
+]])
+
+# Show canonical LR's failure.
+AT_BISON_CHECK([[-Dlr.type=canonical-lr -o input.c input.y]],
+               [[0]], [[]],
+[[input.y: conflicts: 2 shift/reduce
+]])
+AT_COMPILE([[input]])
+AT_PARSER_CHECK([[./input]], [[1]], [[]],
+[[syntax error, unexpected 'a', expecting 'b'
+]])
+
+# It's corrected by LAC.
+AT_BISON_CHECK([[-Dlr.type=canonical-lr -Dparse.lac=full \
+                 -o input.c input.y]], [[0]], [[]],
+[[input.y: conflicts: 2 shift/reduce
+]])
+AT_COMPILE([[input]])
+AT_PARSER_CHECK([[./input]], [[1]], [[]],
+[[syntax error, unexpected 'a', expecting 'b' or 'c'
+]])
+
+# IELR is sufficient when LAC is used.
+AT_BISON_CHECK([[-Dlr.type=ielr -Dparse.lac=full -o input.c input.y]],
+               [[0]], [[]],
+[[input.y: conflicts: 2 shift/reduce
+]])
+AT_COMPILE([[input]])
+AT_PARSER_CHECK([[./input]], [[1]], [[]],
+[[syntax error, unexpected 'a', expecting 'b' or 'c'
+]])
+
+AT_CLEANUP
+
 ## ------------------------- ##
 ## Unresolved SR Conflicts.  ##
 ## ------------------------- ##
diff --git a/tests/input.at b/tests/input.at
index 241c4d0..2546685 100644
--- a/tests/input.at
+++ b/tests/input.at
@@ -1289,3 +1289,22 @@ input.y:5.19: invalid character after \-escape: \001
 ]])
 
 AT_CLEANUP
+
+## ------------------------- ##
+## LAC: Errors for %define.  ##
+## ------------------------- ##
+
+AT_SETUP([[LAC: Errors for %define]])
+
+AT_DATA([[input.y]],
+[[%%
+start: ;
+]])
+
+# parse.lac.* options are useless if LAC isn't actually activated.
+AT_BISON_CHECK([[-Dparse.lac.es-capacity-initial=1 input.y]],
+               [[1]], [],
+[[<command line>:2: %define variable `parse.lac.es-capacity-initial' is not 
used
+]])
+
+AT_CLEANUP
diff --git a/tests/regression.at b/tests/regression.at
index b3fdc29..65d4ac2 100644
--- a/tests/regression.at
+++ b/tests/regression.at
@@ -1469,3 +1469,186 @@ memory exhausted
 ]])
 
 AT_CLEANUP
+
+
+
+## ------------------------ ##
+## LAC: Exploratory stack.  ##
+## ------------------------ ##
+
+AT_SETUP([[LAC: Exploratory stack]])
+
+m4_pushdef([AT_LAC_CHECK], [
+
+AT_BISON_OPTION_PUSHDEFS([$1])
+
+AT_DATA_GRAMMAR([input.y],
+[[%code {
+  #include <stdio.h>
+  void yyerror (char const *);
+  int yylex (]AT_PURE_IF([[YYSTYPE *]], [[void]])[);
+}
+
+]$1[
+%define parse.error verbose
+%token 'c'
+
+%%
+
+// default reductions in inconsistent states
+// v   v v   v v v v   v v v v v v v
+S: A B A A B A A A A B A A A A A A A B C C A A A A A A A A A A A A B ;
+
+A: 'a' | /*empty*/ { printf ("inconsistent default reduction\n"); } ;
+B: 'b' ;
+C: /*empty*/ { printf ("consistent default reduction\n"); } ;
+
+%%
+
+void
+yyerror (char const *msg)
+{
+  fprintf (stderr, "%s\n", msg);
+}
+
+int
+yylex (]AT_PURE_IF([[YYSTYPE *v]], [[void]])[)
+{
+  static char const *input = "bbbbc";]AT_PURE_IF([[
+  *v = 0;]])[
+  return *input++;
+}
+
+int
+main (void)
+{
+  yydebug = 1;
+  return yyparse ();
+}
+]])
+
+# Give exactly the right amount of memory to be sure there's no
+# off-by-one error, for example.
+AT_BISON_CHECK([[-Dparse.lac=full -Dparse.lac.es-capacity=12 \
+                 -t -o input.c input.y]], [[0]], [],
+[[input.y: conflicts: 21 shift/reduce
+]])
+AT_COMPILE([[input]])
+AT_PARSER_CHECK([[./input > stdout.txt 2> stderr.txt]], [[1]])
+
+# Make sure syntax error doesn't forget that 'a' is expected.  It would
+# be forgotten without lookahead correction.
+AT_CHECK([[grep 'syntax error,' stderr.txt]], [[0]],
+[[syntax error, unexpected 'c', expecting 'a' or 'b'
+]])
+
+# Check number of default reductions in inconsistent states to be sure
+# syntax error is detected before unnecessary reductions are performed.
+AT_CHECK([[perl -0777 -ne 'print s/inconsistent default reduction//g;' \
+           < stdout.txt || exit 77]], [[0]], [[14]])
+
+# Check number of default reductions in consistent states to be sure
+# it is performed before the syntax error is detected.
+AT_CHECK([[perl -0777 -ne 'print s/\bconsistent default reduction//g;' \
+           < stdout.txt || exit 77]], [[0]], [[2]])
+
+AT_BISON_OPTION_POPDEFS
+])
+
+AT_LAC_CHECK([[%define api.push-pull pull]])
+AT_LAC_CHECK([[%define api.push-pull pull %define api.pure]])
+AT_LAC_CHECK([[%define api.push-pull both]])
+AT_LAC_CHECK([[%define api.push-pull both %define api.pure]])
+
+m4_popdef([AT_LAC_CHECK])
+
+AT_CLEANUP
+
+
+
+## ------------------------ ##
+## LAC: Memory exhaustion.  ##
+## ------------------------ ##
+
+AT_SETUP([[LAC: Memory exhaustion]])
+
+m4_pushdef([AT_LAC_CHECK], [
+
+AT_DATA_GRAMMAR([input.y],
+[[%code {
+  #include <stdio.h>
+  void yyerror (char const *);
+  int yylex (void);
+}
+
+%error-verbose
+
+%%
+
+S: A A A A A A A A A ;
+A: /*empty*/ | 'a' ;
+
+%%
+
+void
+yyerror (char const *msg)
+{
+  fprintf (stderr, "%s\n", msg);
+}
+
+int
+yylex (void)
+{
+  static char const *input = "]$1[";
+  return *input++;
+}
+
+int
+main (void)
+{
+  yydebug = 1;
+  return yyparse ();
+}
+]])
+
+AT_BISON_CHECK([[-Dparse.lac=full -Dparse.lac.es-capacity=8 \
+                 -t -o input.c input.y]], [[0]], [],
+[[input.y: conflicts: 8 shift/reduce
+]])
+AT_COMPILE([[input]])
+
+])
+
+# Check for memory exhaustion during parsing.
+AT_LAC_CHECK([[]])
+AT_PARSER_CHECK([[./input]], [[2]], [[]],
+[[Starting parse
+Entering state 0
+Reading a token: Now at end of input.
+LAC: initial context established for $end
+LAC: checking lookahead $end: R2 G3 R2 G5 R2 G6 R2 G7 R2 G8 R2 G9 R2 G10 R2 
G11 R2 (max stack size exceeded)
+memory exhausted
+Cleanup: discarding lookahead token $end ()
+Stack now 0
+]])
+
+# Induce an immediate syntax error with an undefined token, and check
+# for memory exhaustion while building syntax error message.
+AT_LAC_CHECK([[z]], [[0]])
+AT_PARSER_CHECK([[./input]], [[2]], [[]],
+[[Starting parse
+Entering state 0
+Reading a token: Next token is token $undefined ()
+LAC: initial context established for $undefined
+LAC: checking lookahead $undefined: Always Err
+Constructing syntax error message
+LAC: checking lookahead $end: R2 G3 R2 G5 R2 G6 R2 G7 R2 G8 R2 G9 R2 G10 R2 
G11 R2 (max stack size exceeded)
+syntax error
+memory exhausted
+Cleanup: discarding lookahead token $undefined ()
+Stack now 0
+]])
+
+m4_popdef([AT_LAC_CHECK])
+
+AT_CLEANUP
-- 
1.7.0.4


>From 107844a3eea478e1d61551e47a88ed73374724c9 Mon Sep 17 00:00:00 2001
From: Joel E. Denny <address@hidden>
Date: Sat, 11 Dec 2010 13:17:13 -0500
Subject: [PATCH 2/3] parse.lac: implement exploratory stack reallocations.

* data/yacc.c: Rename %define variable parse.lac.es-capacity to
parse.lac.es-capacity-initial.  Accept parse.lac.memory-trace
with values of "failures" (default) or "full".
(b4_declare_parser_state_variables): Add yyesa, yyes, and
yyes_capacity variables.
(YYSTACK_USE_ALLOCA): Ignore it if LAC requested.
(YYSTACK_ALLOC, YYSTACK_FREE, YYSTACK_ALLOC_MAXIMUM): Define if
LAC requested.
(YYCOPY_NEEDED): New cpp macro.
(YYCOPY): Define if LAC requested.
(yy_lac_stack_realloc): New function implementing stack
reallocations.  Use YYMAXDEPTH for maximum stack size given that
the stack should never need to grow larger than the main state
stack needs to grow without LAC.
(YY_LAC_ESTABLISH): Update yy_lac invocation.
(yy_lac): Add arguments for exploratory stack memory data
recorded in the main parser.  Invoke yy_lac_stack_realloc when
reallocation is necessary.
(yysyntax_error): Add the same new arguments and pass them to
yy_lac.
(yypstate_delete): Free yyes if necessary.
(yyesa, yyes, yyes_capacity): #define these to yypstate members
in the case of push parsing.
(yyparse, yypush_parse): Initialize yyes and yyes_capacity.
Update yysyntax_error invocations.  At yyreturn, free yyes if
necessary.
* src/parse-gram.y: %define parse.lac full.
* tests/input.at (LAC: errors for %define): Extend for
parse.lac-memory-trace.
* tests/regression.at (LAC: Exploratory stack): Extend to check
that stack reallocs happen when expected.
(LAC: Memory exhaustion): Update to use YYMAXDEPTH and
parse.lac.es-capacity-initial.
---
 ChangeLog           |   37 ++
 data/yacc.c         |  225 ++++++---
 src/parse-gram.c    | 1337 ++++++++++++++++++++++++++++++---------------------
 src/parse-gram.h    |   16 +-
 src/parse-gram.y    |    1 +
 tests/input.at      |    4 +
 tests/regression.at |   19 +-
 7 files changed, 1017 insertions(+), 622 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 63d41ed..997d59f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,42 @@
 2010-12-11  Joel E. Denny  <address@hidden>
 
+       parse.lac: implement exploratory stack reallocations.
+       * data/yacc.c: Rename %define variable parse.lac.es-capacity to
+       parse.lac.es-capacity-initial.  Accept parse.lac.memory-trace
+       with values of "failures" (default) or "full".
+       (b4_declare_parser_state_variables): Add yyesa, yyes, and
+       yyes_capacity variables.
+       (YYSTACK_USE_ALLOCA): Ignore it if LAC requested.
+       (YYSTACK_ALLOC, YYSTACK_FREE, YYSTACK_ALLOC_MAXIMUM): Define if
+       LAC requested.
+       (YYCOPY_NEEDED): New cpp macro.
+       (YYCOPY): Define if LAC requested.
+       (yy_lac_stack_realloc): New function implementing stack
+       reallocations.  Use YYMAXDEPTH for maximum stack size given that
+       the stack should never need to grow larger than the main state
+       stack needs to grow without LAC.
+       (YY_LAC_ESTABLISH): Update yy_lac invocation.
+       (yy_lac): Add arguments for exploratory stack memory data
+       recorded in the main parser.  Invoke yy_lac_stack_realloc when
+       reallocation is necessary.
+       (yysyntax_error): Add the same new arguments and pass them to
+       yy_lac.
+       (yypstate_delete): Free yyes if necessary.
+       (yyesa, yyes, yyes_capacity): #define these to yypstate members
+       in the case of push parsing.
+       (yyparse, yypush_parse): Initialize yyes and yyes_capacity.
+       Update yysyntax_error invocations.  At yyreturn, free yyes if
+       necessary.
+       * src/parse-gram.y: %define parse.lac full.
+       * tests/input.at (LAC: errors for %define): Extend for
+       parse.lac-memory-trace.
+       * tests/regression.at (LAC: Exploratory stack): Extend to check
+       that stack reallocs happen when expected.
+       (LAC: Memory exhaustion): Update to use YYMAXDEPTH and
+       parse.lac.es-capacity-initial.
+
+2010-12-11  Joel E. Denny  <address@hidden>
+
        parse.lac: implement as %define variable.
        LAC = lookahead correction.  See discussion at
        <http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00034.html>.
diff --git a/data/yacc.c b/data/yacc.c
index 5ba564e..d8db355 100644
--- a/data/yacc.c
+++ b/data/yacc.c
@@ -38,11 +38,14 @@ b4_use_push_for_pull_if([
   b4_push_if([m4_define([b4_use_push_for_pull_flag], [[0]])],
              [m4_define([b4_push_flag], [[1]])])])
 
-# Check the value of %define parse.lac, where LAC stands for lookahead
-# correction.
+# Check the value of %define parse.lac and friends, where LAC stands for
+# lookahead correction.
 b4_percent_define_default([[parse.lac]], [[none]])
-b4_percent_define_default([[parse.lac.es-capacity]], [[20]])
-b4_percent_define_check_values([[[[parse.lac]], [[full]], [[none]]]])
+b4_percent_define_default([[parse.lac.es-capacity-initial]], [[20]])
+b4_percent_define_default([[parse.lac.memory-trace]], [[failures]])
+b4_percent_define_check_values([[[[parse.lac]], [[full]], [[none]]]],
+                               [[[[parse.lac.memory-trace]],
+                                 [[failures]], [[full]]]])
 b4_define_flag_if([lac])
 m4_define([b4_lac_flag],
           [m4_if(b4_percent_define_get([[parse.lac]]),
@@ -217,7 +220,11 @@ m4_define([b4_declare_parser_state_variables], 
[b4_pure_if([[
     /* The locations where the error started and ended.  */
     YYLTYPE yyerror_range[3];]])[
 
-    YYSIZE_T yystacksize;]])
+    YYSIZE_T yystacksize;]b4_lac_if([[
+
+    yytype_int16 
address@hidden([[parse.lac.es-capacity-initial]])address@hidden;
+    yytype_int16 *yyes;
+    YYSIZE_T yyes_capacity;]])])
 
 
 ## --------------------------------------------------------- ##
@@ -411,10 +418,10 @@ typedef short int yytype_int16;
 }
 #endif
 
-#if ! defined yyoverflow || YYERROR_VERBOSE
+#if ]b4_lac_if([[1]], [[! defined yyoverflow || YYERROR_VERBOSE]])[
 
-]b4_push_if([],
-[[/* The parser invokes alloca or malloc; define the necessary symbols.  */
+/* The parser invokes alloca or malloc; define the necessary symbols.  */]dnl
+b4_push_if([], [b4_lac_if([], [[
 
 # ifdef YYSTACK_USE_ALLOCA
 #  if YYSTACK_USE_ALLOCA
@@ -437,10 +444,9 @@ typedef short int yytype_int16;
 #    endif
 #   endif
 #  endif
-# endif
+# endif]])])[
 
-]])dnl
-[# ifdef YYSTACK_ALLOC
+# ifdef YYSTACK_ALLOC
    /* Pacify GCC's `empty if-body' warning.  */
 #  define YYSTACK_FREE(Ptr) do { /* empty */; } while (YYID (0))
 #  ifndef YYSTACK_ALLOC_MAXIMUM
@@ -476,8 +482,9 @@ void *malloc (YYSIZE_T); /* INFRINGES ON USER NAME SPACE */
 void free (void *); /* INFRINGES ON USER NAME SPACE */
 #   endif
 #  endif
-# endif
-#endif /* ! defined yyoverflow || YYERROR_VERBOSE */
+# endif]b4_lac_if([[
+# define YYCOPY_NEEDED 1]])[
+#endif]b4_lac_if([], [[ /* ! defined yyoverflow || YYERROR_VERBOSE */]])[
 
 
 #if (! defined yyoverflow \
@@ -506,23 +513,7 @@ union yyalloc
      ((N) * (sizeof (yytype_int16) + sizeof (YYSTYPE)) \
       + YYSTACK_GAP_MAXIMUM)])[
 
-/* Copy COUNT objects from FROM to TO.  The source and destination do
-   not overlap.  */
-# ifndef YYCOPY
-#  if defined __GNUC__ && 1 < __GNUC__
-#   define YYCOPY(To, From, Count) \
-      __builtin_memcpy (To, From, (Count) * sizeof (*(From)))
-#  else
-#   define YYCOPY(To, From, Count)             \
-      do                                       \
-       {                                       \
-         YYSIZE_T yyi;                         \
-         for (yyi = 0; yyi < (Count); yyi++)   \
-           (To)[yyi] = (From)[yyi];            \
-       }                                       \
-      while (YYID (0))
-#  endif
-# endif
+# define YYCOPY_NEEDED 1
 
 /* Relocate STACK from its old location to the new one.  The
    local variables YYSIZE and YYSTACKSIZE give the old and new number of
@@ -542,6 +533,26 @@ union yyalloc
 
 #endif
 
+#if defined YYCOPY_NEEDED && YYCOPY_NEEDED
+/* Copy COUNT objects from FROM to TO.  The source and destination do
+   not overlap.  */
+# ifndef YYCOPY
+#  if defined __GNUC__ && 1 < __GNUC__
+#   define YYCOPY(To, From, Count) \
+      __builtin_memcpy (To, From, (Count) * sizeof (*(From)))
+#  else
+#   define YYCOPY(To, From, Count)             \
+      do                                       \
+       {                                       \
+         YYSIZE_T yyi;                         \
+         for (yyi = 0; yyi < (Count); yyi++)   \
+           (To)[yyi] = (From)[yyi];            \
+       }                                       \
+      while (YYID (0))
+#  endif
+# endif
+#endif /* !YYCOPY_NEEDED */
+
 /* YYFINAL -- State number of the termination state.  */
 #define YYFINAL  ]b4_final_state_number[
 /* YYLAST -- Last index in YYTABLE.  */
@@ -826,6 +837,68 @@ int yydebug;
 # define YYMAXDEPTH ]b4_stack_depth_max[
 #endif]b4_lac_if([[
 
+/* Given a state stack such that *YYBOTTOM is its bottom, such that
+   *YYTOP is either its top or is YYTOP_EMPTY to indicate an empty
+   stack, and such that *YYCAPACITY is the maximum number of elements it
+   can hold without a reallocation, make sure there is enough room to
+   store YYADD more elements.  If not, allocate a new stack using
+   YYSTACK_ALLOC, copy the existing elements, and adjust *YYBOTTOM,
+   *YYTOP, and *YYCAPACITY to reflect the new capacity and memory
+   location.  If *YYBOTTOM != YYBOTTOM_NO_FREE, then free the old stack
+   using YYSTACK_FREE.  Return 0 if successful or if no reallocation is
+   required.  Return 1 if memory is exhausted.  */
+static int
+yy_lac_stack_realloc (YYSIZE_T *yycapacity, YYSIZE_T yyadd,
+#if YYDEBUG
+                      char const *yydebug_prefix,
+                      char const *yydebug_suffix,
+#endif
+                      yytype_int16 **yybottom,
+                      yytype_int16 *yybottom_no_free,
+                      yytype_int16 **yytop, yytype_int16 *yytop_empty)
+{
+  YYSIZE_T yysize_old =
+    *yytop == yytop_empty ? 0 : *yytop - *yybottom + 1;
+  YYSIZE_T yysize_new = yysize_old + yyadd;
+  if (*yycapacity < yysize_new)
+    {
+      YYSIZE_T yyalloc = 2 * yysize_new;
+      yytype_int16 *yybottom_new;
+      /* Use YYMAXDEPTH for maximum stack size given that the stack
+         should never need to grow larger than the main state stack
+         needs to grow without LAC.  */
+      if (YYMAXDEPTH < yysize_new)
+        {
+          YYDPRINTF ((stderr, "%smax size exceeded%s", yydebug_prefix,
+                      yydebug_suffix));
+          return 1;
+        }
+      if (YYMAXDEPTH < yyalloc)
+        yyalloc = YYMAXDEPTH;
+      yybottom_new =
+        (yytype_int16*) YYSTACK_ALLOC (yyalloc * sizeof *yybottom_new);
+      if (!yybottom_new)
+        {
+          YYDPRINTF ((stderr, "%srealloc failed%s", yydebug_prefix,
+                      yydebug_suffix));
+          return 1;
+        }
+      if (*yytop != yytop_empty)
+        {
+          YYCOPY (yybottom_new, *yybottom, yysize_old);
+          *yytop = yybottom_new + (yysize_old - 1);
+        }
+      if (*yybottom != yybottom_no_free)
+        YYSTACK_FREE (*yybottom);
+      *yybottom = yybottom_new;
+      *yycapacity = 
yyalloc;]m4_if(b4_percent_define_get([[parse.lac.memory-trace]]),
+                                   [full], [[
+      YYDPRINTF ((stderr, "%srealloc to %lu%s", yydebug_prefix,
+                  (unsigned long int) yyalloc, yydebug_suffix));]])[
+    }
+  return 0;
+}
+
 /* Establish the initial context for the current lookahead if no initial
    context is currently established.
 
@@ -852,23 +925,23 @@ int yydebug;
    current lookahead, then check if that lookahead can eventually be
    shifted if syntactic actions continue from the current context.
    Report a syntax error if it cannot.  */
-#define YY_LAC_ESTABLISH                                       \
-do {                                                           \
-  if (!yy_lac_established)                                     \
-    {                                                          \
-      YYDPRINTF ((stderr,                                      \
-                  "LAC: initial context established for %s\n", \
-                  yytname[yytoken]));                          \
-      yy_lac_established = 1;                                  \
-      {                                                        \
-        int yy_lac_status =                                    \
-          yy_lac (yyssp, yytoken);                             \
-        if (yy_lac_status == 2)                                \
-          goto yyexhaustedlab;                                 \
-        if (yy_lac_status == 1)                                \
-          goto yyerrlab;                                       \
-      }                                                        \
-    }                                                          \
+#define YY_LAC_ESTABLISH                                         \
+do {                                                             \
+  if (!yy_lac_established)                                       \
+    {                                                            \
+      YYDPRINTF ((stderr,                                        \
+                  "LAC: initial context established for %s\n",   \
+                  yytname[yytoken]));                            \
+      yy_lac_established = 1;                                    \
+      {                                                          \
+        int yy_lac_status =                                      \
+          yy_lac (yyesa, &yyes, &yyes_capacity, yyssp, yytoken); \
+        if (yy_lac_status == 2)                                  \
+          goto yyexhaustedlab;                                   \
+        if (yy_lac_status == 1)                                  \
+          goto yyerrlab;                                         \
+      }                                                          \
+    }                                                            \
 } while (YYID (0))
 
 /* Discard any previous initial lookahead context because of Event,
@@ -898,13 +971,18 @@ do {                                                      
               \
 #endif
 
 /* Given the stack whose top is *YYSSP, return 0 iff YYTOKEN can
-   eventually (after perhaps some reductions) be shifted, and return 1
-   if not.  Return 2 if memory is exhausted.  */
+   eventually (after perhaps some reductions) be shifted, return 1 if
+   not, or return 2 if memory is exhausted.  As preconditions and
+   postconditions: *YYES_CAPACITY is the allocated size of the array to
+   which *YYES points, and either *YYES = YYESA or *YYES points to an
+   array allocated with YYSTACK_ALLOC.  yy_lac may overwrite the
+   contents of either array, alter *YYES and *YYES_CAPACITY, and free
+   any old *YYES other than YYESA.  */
 static int
-yy_lac (yytype_int16 *yyssp, int yytoken)
+yy_lac (yytype_int16 *yyesa, yytype_int16 **yyes,
+        YYSIZE_T *yyes_capacity, yytype_int16 *yyssp, int yytoken)
 {
   yytype_int16 *yyes_prev = yyssp;
-  yytype_int16 address@hidden([[parse.lac.es-capacity]])address@hidden;
   yytype_int16 *yyesp = yyes_prev;
   YYDPRINTF ((stderr, "LAC: checking lookahead %s:", yytname[yytoken]));
   if (yytoken == YYUNDEFTOK)
@@ -946,7 +1024,7 @@ yy_lac (yytype_int16 *yyssp, int yytoken)
         YYDPRINTF ((stderr, " R%d", yyrule - 1));
         if (yyesp != yyes_prev)
           {
-            YYSIZE_T yysize = yyesp - yyes + 1;
+            YYSIZE_T yysize = yyesp - *yyes + 1;
             if (yylen < yysize)
               {
                 yyesp -= yylen;
@@ -974,14 +1052,18 @@ yy_lac (yytype_int16 *yyssp, int yytoken)
         }
         if (yyesp == yyes_prev)
           {
-            yyesp = yyes;
+            yyesp = *yyes;
             *yyesp = yystate;
           }
         else
           {
-            if (yyesp == yyes + (sizeof yyes / sizeof *yyes) - 1)
+            if (yy_lac_stack_realloc (yyes_capacity, 1,
+#if YYDEBUG
+                                      " (", ")",
+#endif
+                                      yyes, yyesa, &yyesp, yyes_prev))
               {
-                YYDPRINTF ((stderr, " (max stack size exceeded)\n"));
+                YYDPRINTF ((stderr, "\n"));
                 return 2;
               }
             *++yyesp = yystate;
@@ -1081,7 +1163,7 @@ yytnamerr (char *yyres, const char *yystr)
 /* Copy into *YYMSG, which is of size *YYMSG_ALLOC, an error message
    about the unexpected token YYTOKEN for the state stack whose top is
    YYSSP.]b4_lac_if([[  In order to see if a particular token T is a
-   valid looakhead, invoke yy_lac (YYSSP, T).]])[
+   valid looakhead, invoke yy_lac (YYESA, YYES, YYES_CAPACITY, YYSSP, T).]])[
 
    Return 0 if *YYMSG was successfully written.  Return 1 if *YYMSG is
    not large enough to hold the message.  In that case, also set
@@ -1090,7 +1172,8 @@ yytnamerr (char *yyres, const char *yystr)
    yy_lac returned 2]])[.  */
 static int
 yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
-                yytype_int16 *yyssp, int yytoken)
+                ]b4_lac_if([[yytype_int16 *yyesa, yytype_int16 **yyes,
+                YYSIZE_T *yyes_capacity, ]])[yytype_int16 *yyssp, int yytoken)
 {
   YYSIZE_T yysize0 = yytnamerr (0, yytname[yytoken]);
   YYSIZE_T yysize = yysize0;
@@ -1156,7 +1239,8 @@ yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
             if (yyx != YYTERROR && yyx != YYUNDEFTOK)
               {
                 {
-                  int yy_lac_status = yy_lac (yyssp, yyx);
+                  int yy_lac_status = yy_lac (yyesa, yyes, yyes_capacity,
+                                              yyssp, yyx);
                   if (yy_lac_status == 2)
                     return 2;
                   if (yy_lac_status == 1)
@@ -1321,7 +1405,9 @@ b4_c_function_def([[yyparse]], [[int]], b4_parse_param)[
      stack still needs to be freed.  */
   if (!yyps->yynew && yyps->yyss != yyps->yyssa)
     YYSTACK_FREE (yyps->yyss);
-#endif
+#endif]b4_lac_if([[
+  if (!yyps->yynew && yyps->yyes != yyps->yyesa)
+    YYSTACK_FREE (yyps->yyes);]])[
   free (yyps);]b4_pure_if([], [[
   yypstate_allocated = 0;]])[
 }
@@ -1339,7 +1425,10 @@ b4_c_function_def([[yyparse]], [[int]], b4_parse_param)[
 #define yyls yyps->yyls
 #define yylsp yyps->yylsp
 #define yyerror_range yyps->yyerror_range]])[
-#define yystacksize yyps->yystacksize
+#define yystacksize yyps->yystacksize]b4_lac_if([[
+#define yyesa yyps->yyesa
+#define yyes yyps->yyes
+#define yyes_capacity yyps->yyes_capacity]])[
 
 
 /*---------------.
@@ -1405,7 +1494,12 @@ b4_c_function_def([[yyparse]], [[int]], b4_parse_param)[
   yyss = yyssa;
   yyvs = yyvsa;]b4_locations_if([[
   yyls = yylsa;]])[
-  yystacksize = YYINITDEPTH;
+  yystacksize = YYINITDEPTH;]b4_lac_if([[
+
+  yyes = yyesa;
+  yyes_capacity = sizeof yyesa / sizeof *yyes;
+  if (YYMAXDEPTH < yyes_capacity)
+    yyes_capacity = YYMAXDEPTH;]])[
 
   YYDPRINTF ((stderr, "Starting parse\n"));
 
@@ -1709,8 +1803,9 @@ yyerrlab:
 #if ! YYERROR_VERBOSE
       yyerror (]b4_yyerror_args[YY_("syntax error"));
 #else
-# define YYSYNTAX_ERROR yysyntax_error (&yymsg_alloc, &yymsg, yyssp, \
-                                        yytoken)
+# define YYSYNTAX_ERROR yysyntax_error (&yymsg_alloc, &yymsg, \]b4_lac_if([[
+                                        yyesa, &yyes, &yyes_capacity, \]])[
+                                        yyssp, yytoken)
       {
         char const *yymsgp = YY_("syntax error");
         int yysyntax_error_status;]b4_lac_if([[
@@ -1888,7 +1983,9 @@ yyreturn:
 #ifndef yyoverflow
   if (yyss != yyssa)
     YYSTACK_FREE (yyss);
-#endif]b4_push_if([[
+#endif]b4_lac_if([[
+  if (yyes != yyesa)
+    YYSTACK_FREE (yyes);]])b4_push_if([[
   yyps->yynew = 1;
 
 yypushreturn:]])[
diff --git a/src/parse-gram.y b/src/parse-gram.y
index d588dfa..8787882 100644
--- a/src/parse-gram.y
+++ b/src/parse-gram.y
@@ -73,6 +73,7 @@ static char const *char_name (char);
 %locations
 %pure-parser
 %define parse.error "verbose"
+%define parse.lac full
 %name-prefix="gram_"
 %expect 0
 
diff --git a/tests/input.at b/tests/input.at
index 2546685..6cfa2b1 100644
--- a/tests/input.at
+++ b/tests/input.at
@@ -1306,5 +1306,9 @@ AT_BISON_CHECK([[-Dparse.lac.es-capacity-initial=1 
input.y]],
                [[1]], [],
 [[<command line>:2: %define variable `parse.lac.es-capacity-initial' is not 
used
 ]])
+AT_BISON_CHECK([[-Dparse.lac.memory-trace=full input.y]],
+               [[1]], [],
+[[<command line>:2: %define variable `parse.lac.memory-trace' is not used
+]])
 
 AT_CLEANUP
diff --git a/tests/regression.at b/tests/regression.at
index 65d4ac2..b1555c5 100644
--- a/tests/regression.at
+++ b/tests/regression.at
@@ -1498,6 +1498,8 @@ AT_DATA_GRAMMAR([input.y],
 // default reductions in inconsistent states
 // v   v v   v v v v   v v v v v v v
 S: A B A A B A A A A B A A A A A A A B C C A A A A A A A A A A A A B ;
+//       ^                     ^                               ^
+// LAC reallocs
 
 A: 'a' | /*empty*/ { printf ("inconsistent default reduction\n"); } ;
 B: 'b' ;
@@ -1527,9 +1529,8 @@ main (void)
 }
 ]])
 
-# Give exactly the right amount of memory to be sure there's no
-# off-by-one error, for example.
-AT_BISON_CHECK([[-Dparse.lac=full -Dparse.lac.es-capacity=12 \
+AT_BISON_CHECK([[-Dparse.lac=full -Dparse.lac.es-capacity-initial=1 \
+                 -Dparse.lac.memory-trace=full \
                  -t -o input.c input.y]], [[0]], [],
 [[input.y: conflicts: 21 shift/reduce
 ]])
@@ -1552,6 +1553,11 @@ AT_CHECK([[perl -0777 -ne 'print s/inconsistent default 
reduction//g;' \
 AT_CHECK([[perl -0777 -ne 'print s/\bconsistent default reduction//g;' \
            < stdout.txt || exit 77]], [[0]], [[2]])
 
+# Check number of reallocs to be sure reallocated memory isn't somehow
+# lost between LAC invocations.
+AT_CHECK([[perl -0777 -ne 'print s/\(realloc//g;' < stderr.txt \
+           || exit 77]], [[0]], [[3]])
+
 AT_BISON_OPTION_POPDEFS
 ])
 
@@ -1579,6 +1585,7 @@ AT_DATA_GRAMMAR([input.y],
   #include <stdio.h>
   void yyerror (char const *);
   int yylex (void);
+  #define YYMAXDEPTH 8
 }
 
 %error-verbose
@@ -1611,7 +1618,7 @@ main (void)
 }
 ]])
 
-AT_BISON_CHECK([[-Dparse.lac=full -Dparse.lac.es-capacity=8 \
+AT_BISON_CHECK([[-Dparse.lac=full -Dparse.lac.es-capacity-initial=1 \
                  -t -o input.c input.y]], [[0]], [],
 [[input.y: conflicts: 8 shift/reduce
 ]])
@@ -1626,7 +1633,7 @@ AT_PARSER_CHECK([[./input]], [[2]], [[]],
 Entering state 0
 Reading a token: Now at end of input.
 LAC: initial context established for $end
-LAC: checking lookahead $end: R2 G3 R2 G5 R2 G6 R2 G7 R2 G8 R2 G9 R2 G10 R2 
G11 R2 (max stack size exceeded)
+LAC: checking lookahead $end: R2 G3 R2 G5 R2 G6 R2 G7 R2 G8 R2 G9 R2 G10 R2 
G11 R2 (max size exceeded)
 memory exhausted
 Cleanup: discarding lookahead token $end ()
 Stack now 0
@@ -1642,7 +1649,7 @@ Reading a token: Next token is token $undefined ()
 LAC: initial context established for $undefined
 LAC: checking lookahead $undefined: Always Err
 Constructing syntax error message
-LAC: checking lookahead $end: R2 G3 R2 G5 R2 G6 R2 G7 R2 G8 R2 G9 R2 G10 R2 
G11 R2 (max stack size exceeded)
+LAC: checking lookahead $end: R2 G3 R2 G5 R2 G6 R2 G7 R2 G8 R2 G9 R2 G10 R2 
G11 R2 (max size exceeded)
 syntax error
 memory exhausted
 Cleanup: discarding lookahead token $undefined ()
-- 
1.7.0.4


>From fcf834f9ecf080784b741782f4206df1e1a2957a Mon Sep 17 00:00:00 2001
From: Joel E. Denny <address@hidden>
Date: Sun, 19 Dec 2010 22:12:32 -0500
Subject: [PATCH 3/3] parse.lac: document.

* NEWS (2.5): Add entry for LAC, and mention LAC in entry for
other corrections to verbose syntax error messages.
* doc/bison.texinfo (Decl Summary): Rewrite entries for
lr.default-reductions and lr.type to be clearer, to mention
%nonassoc's effect on canonical LR, and to mention LAC.  Add entry
for parse.lac.
(Glossary): Add entry for LAC.
---
 ChangeLog         |   11 +++
 NEWS              |   72 ++++++++++++++++----
 doc/bison.texinfo |  190 ++++++++++++++++++++++++++++++++++++++++------------
 3 files changed, 214 insertions(+), 59 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 997d59f..aac9e77 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2010-12-19  Joel E. Denny  <address@hidden>
+
+       parse.lac: document.
+       * NEWS (2.5): Add entry for LAC, and mention LAC in entry for
+       other corrections to verbose syntax error messages.
+       * doc/bison.texinfo (Decl Summary): Rewrite entries for
+       lr.default-reductions and lr.type to be clearer, to mention
+       %nonassoc's effect on canonical LR, and to mention LAC.  Add entry
+       for parse.lac.
+       (Glossary): Add entry for LAC.
+
 2010-12-11  Joel E. Denny  <address@hidden>
 
        parse.lac: implement exploratory stack reallocations.
diff --git a/NEWS b/NEWS
index c818839..5b763b9 100644
--- a/NEWS
+++ b/NEWS
@@ -117,6 +117,46 @@ Bison News
   These features are experimental.  More user feedback will help to
   stabilize them.
 
+** LAC (lookahead correction) for syntax error handling:
+
+  Canonical LR, IELR, and LALR can suffer from a couple of problems
+  upon encountering a syntax error.  First, the parser might perform
+  additional parser stack reductions before discovering the syntax
+  error.  Such reductions perform user semantic actions that are
+  unexpected because they are based on an invalid token, and they
+  cause error recovery to begin in a different syntactic context than
+  the one in which the invalid token was encountered.  Second, when
+  verbose error messages are enabled (with %error-verbose or `#define
+  YYERROR_VERBOSE'), the expected token list in the syntax error
+  message can both contain invalid tokens and omit valid tokens.
+
+  The culprits for the above problems are %nonassoc, default
+  reductions in inconsistent states, and parser state merging.  Thus,
+  IELR and LALR suffer the most.  Canonical LR can suffer only if
+  %nonassoc is used or if default reductions are enabled for
+  inconsistent states.
+
+  LAC is a new mechanism within the parsing algorithm that completely
+  solves these problems for canonical LR, IELR, and LALR without
+  sacrificing %nonassoc, default reductions, or state mering.  When
+  LAC is in use, canonical LR and IELR behave exactly the same for
+  both syntactically acceptable and syntactically unacceptable input.
+  While LALR still does not support the full language-recognition
+  power of canonical LR and IELR, LAC at least enables LALR's syntax
+  error handling to correctly reflect LALR's language-recognition
+  power.
+
+  Currently, LAC is only supported for deterministic parsers in C.
+  You can enable LAC with the following directive:
+
+    %define parse.lac full
+
+  See the documentation for `%define parse.lac' in the section `Bison
+  Declaration Summary' in the Bison manual for additional details.
+
+  LAC is an experimental feature.  More user feedback will help to
+  stabilize it.
+
 ** Unrecognized %code qualifiers are now an error not a warning.
 
 ** %define improvements.
@@ -225,11 +265,11 @@ Bison News
 
 ** Verbose syntax error message fixes:
 
-  When %error-verbose or `#define YYERROR_VERBOSE' is specified, syntax
-  error messages produced by the generated parser include the unexpected
-  token as well as a list of expected tokens.  The effect of %nonassoc
-  on these verbose messages has been corrected in two ways, but
-  additional fixes are still being implemented:
+  When %error-verbose or `#define YYERROR_VERBOSE' is specified,
+  syntax error messages produced by the generated parser include the
+  unexpected token as well as a list of expected tokens.  The effect
+  of %nonassoc on these verbose messages has been corrected in two
+  ways, but a complete fix requires LAC, described above:
 
 *** When %nonassoc is used, there can exist parser states that accept no
     tokens, and so the parser does not always require a lookahead token
@@ -248,16 +288,18 @@ Bison News
     tokens are now properly omitted from the list.
 
 *** Expected token lists are still often wrong due to state merging
-    (from LALR or IELR) and default reductions, which can both add and
-    subtract valid tokens.  Canonical LR almost completely fixes this
-    problem by eliminating state merging and default reductions.
-    However, there is one minor problem left even when using canonical
-    LR and even after the fixes above.  That is, if the resolution of a
-    conflict with %nonassoc appears in a later parser state than the one
-    at which some syntax error is discovered, the conflicted token is
-    still erroneously included in the expected token list.  We are
-    currently working on a fix to eliminate this problem and to
-    eliminate the need for canonical LR.
+    (from LALR or IELR) and default reductions, which can both add
+    invalid tokens and subtract valid tokens.  Canonical LR almost
+    completely fixes this problem by eliminating state merging and
+    default reductions.  However, there is one minor problem left even
+    when using canonical LR and even after the fixes above.  That is,
+    if the resolution of a conflict with %nonassoc appears in a later
+    parser state than the one at which some syntax error is
+    discovered, the conflicted token is still erroneously included in
+    the expected token list.  Bison's new LAC implementation,
+    described above, eliminates this problem and the need for
+    canonical LR.  However, LAC is still experimental and is disabled
+    by default.
 
 ** Destructor calls fixed for lookaheads altered in semantic actions.
 
diff --git a/doc/bison.texinfo b/doc/bison.texinfo
index 209bc5c..2d96352 100644
--- a/doc/bison.texinfo
+++ b/doc/bison.texinfo
@@ -5230,57 +5230,61 @@ Boolean.
 @findex %define lr.default-reductions
 @cindex delayed syntax errors
 @cindex syntax errors delayed
address@hidden @acronym{LAC}
address@hidden %nonassoc
 
 @itemize @bullet
 @item Language(s): all
 
address@hidden Purpose: Specifies the kind of states that are permitted to
address@hidden Purpose: Specify the kind of states that are permitted to
 contain default reductions.
-That is, in such a state, Bison declares the reduction with the largest
-lookahead set to be the default reduction and then removes that
+That is, in such a state, Bison selects the reduction with the largest
+lookahead set to be the default parser action and then removes that
 lookahead set.
-The advantages of default reductions are discussed below.
-The disadvantage is that, when the generated parser encounters a
-syntactically unacceptable token, the parser might then perform
-unnecessary default reductions before it can detect the syntax error.
-
-(This feature is experimental.
+(The ability to specify where default reductions should be used is
+experimental.
 More user feedback will help to stabilize it.)
 
 @item Accepted Values:
 @itemize
 @item @code{all}.
-For @acronym{LALR} and @acronym{IELR} parsers (@pxref{Decl
-Summary,,lr.type}) by default, all states are permitted to contain
-default reductions.
-The advantage is that parser table sizes can be significantly reduced.
-The reason Bison does not by default attempt to address the disadvantage
-of delayed syntax error detection is that this disadvantage is already
-inherent in @acronym{LALR} and @acronym{IELR} parser tables.
-That is, unlike in a canonical @acronym{LR} state, the lookahead sets of
-reductions in an @acronym{LALR} or @acronym{IELR} state can contain
-tokens that are syntactically incorrect for some left contexts.
+This is the traditional Bison behavior.
+The main advantage is a significant decrease in the size of the parser
+tables.
+The disadvantage is that, when the generated parser encounters a
+syntactically unacceptable token, the parser might then perform
+unnecessary default reductions before it can detect the syntax error.
+Such delayed syntax error detection is usually inherent in
address@hidden and @acronym{IELR} parser tables anyway due to
address@hidden state merging (@pxref{Decl Summary,,lr.type}).
+Furthermore, the use of @code{%nonassoc} can contribute to delayed
+syntax error detection even in the case of canonical @acronym{LR}.
+As an experimental feature, delayed syntax error detection can be
+overcome in all cases by enabling @acronym{LAC} (@pxref{Decl
+Summary,,parse.lac}, for details, including a discussion of the effects
+of delayed syntax error detection).
 
 @item @code{consistent}.
 @cindex consistent states
 A consistent state is a state that has only one possible action.
 If that action is a reduction, then the parser does not need to request
 a lookahead token from the scanner before performing that action.
-However, the parser only recognizes the ability to ignore the lookahead
-token when such a reduction is encoded as a default reduction.
-Thus, if default reductions are permitted in and only in consistent
-states, then a canonical @acronym{LR} parser reports a syntax error as
-soon as it @emph{needs} the syntactically unacceptable token from the
-scanner.
+However, the parser recognizes the ability to ignore the lookahead token
+in this way only when such a reduction is encoded as a default
+reduction.
+Thus, if default reductions are permitted only in consistent states,
+then a canonical @acronym{LR} parser that does not employ
address@hidden detects a syntax error as soon as it @emph{needs} the
+syntactically unacceptable token from the scanner.
 
 @item @code{accepting}.
 @cindex accepting state
-By default, the only default reduction permitted in a canonical
address@hidden parser is the accept action in the accepting state, which
-the parser reaches only after reading all tokens from the input.
-Thus, the default canonical @acronym{LR} parser reports a syntax error
-as soon as it @emph{reaches} the syntactically unacceptable token
-without performing any extra reductions.
+In the accepting state, the default reduction is actually the accept
+action.
+In this case, a canonical @acronym{LR} parser that does not employ
address@hidden detects a syntax error as soon as it @emph{reaches} the
+syntactically unacceptable token in the input.
+That is, it does not perform any extra reductions.
 @end itemize
 
 @item Default Value:
@@ -5400,17 +5404,23 @@ This can significantly reduce the complexity of 
developing of a grammar.
 @item @code{canonical-lr}.
 @cindex delayed syntax errors
 @cindex syntax errors delayed
-The only advantage of canonical @acronym{LR} over @acronym{IELR} is
-that, for every left context of every canonical @acronym{LR} state, the
-set of tokens accepted by that state is the exact set of tokens that is
-syntactically acceptable in that left context.
-Thus, the only difference in parsing behavior is that the canonical
address@hidden parser can report a syntax error as soon as possible
-without performing any unnecessary reductions.
address@hidden Summary,,lr.default-reductions}, for further details.
-Even when canonical @acronym{LR} behavior is ultimately desired,
address@hidden's elimination of duplicate conflicts should still
-facilitate the development of a grammar.
address@hidden @acronym{LAC}
address@hidden %nonassoc
+While inefficient, canonical @acronym{LR} parser tables can be an
+interesting means to explore a grammar because they have a property that
address@hidden and @acronym{LALR} tables do not.
+That is, if @code{%nonassoc} is not used and default reductions are left
+disabled (@pxref{Decl Summary,,lr.default-reductions}), then, for every
+left context of every canonical @acronym{LR} state, the set of tokens
+accepted by that state is guaranteed to be the exact set of tokens that
+is syntactically acceptable in that left context.
+It might then seem that an advantage of canonical @acronym{LR} parsers
+in production is that, under the above constraints, they are guaranteed
+to detect a syntax error as soon as possible without performing any
+unnecessary reductions.
+However, @acronym{IELR} parsers using @acronym{LAC} (@pxref{Decl
+Summary,,parse.lac}) are also able to achieve this behavior without
+sacrificing @code{%nonassoc} or default reductions.
 @end itemize
 
 @item Default Value: @code{lalr}
@@ -5448,7 +5458,7 @@ destroyed properly.  This option checks these constraints.
 @findex %define parse.error
 @itemize
 @item Languages(s):
-all.
+all
 @item Purpose:
 Control the kind of error messages passed to the error reporting
 function.  @xref{Error Reporting, ,The Error Reporting Function
@@ -5469,6 +5479,90 @@ ones.
 @c parse.error
 
 
address@hidden ================================================== parse.lac
address@hidden parse.lac
address@hidden %define parse.lac
address@hidden @acronym{LAC}
address@hidden lookahead correction
+
address@hidden
address@hidden Languages(s): C
+
address@hidden Purpose: Enable @acronym{LAC} (lookahead correction) to improve
+syntax error handling.
+
+Canonical @acronym{LR}, @acronym{IELR}, and @acronym{LALR} can suffer
+from a couple of problems upon encountering a syntax error.  First, the
+parser might perform additional parser stack reductions before
+discovering the syntax error.  Such reductions perform user semantic
+actions that are unexpected because they are based on an invalid token,
+and they cause error recovery to begin in a different syntactic context
+than the one in which the invalid token was encountered.  Second, when
+verbose error messages are enabled (with @code{%error-verbose} or
address@hidden YYERROR_VERBOSE}), the expected token list in the syntax
+error message can both contain invalid tokens and omit valid tokens.
+
+The culprits for the above problems are @code{%nonassoc}, default
+reductions in inconsistent states, and parser state merging.  Thus,
address@hidden and @acronym{LALR} suffer the most.  Canonical
address@hidden can suffer only if @code{%nonassoc} is used or if default
+reductions are enabled for inconsistent states.
+
address@hidden is a new mechanism within the parsing algorithm that
+completely solves these problems for canonical @acronym{LR},
address@hidden, and @acronym{LALR} without sacrificing @code{%nonassoc},
+default reductions, or state mering.  Conceptually, the mechanism is
+straight-forward.  Whenever the parser fetches a new token from the
+scanner so that it can determine the next parser action, it immediately
+suspends normal parsing and performs an exploratory parse using a
+temporary copy of the normal parser state stack.  During this
+exploratory parse, the parser does not perform user semantic actions.
+If the exploratory parse reaches a shift action, normal parsing then
+resumes on the normal parser stacks.  If the exploratory parse reaches
+an error instead, the parser reports a syntax error.  If verbose syntax
+error messages are enabled, the parser must then discover the list of
+expected tokens, so it performs a separate exploratory parse for each
+token in the grammar.
+
+There is one subtlety about the use of @acronym{LAC}.  That is, when in
+a consistent parser state with a default reduction, the parser will not
+attempt to fetch a token from the scanner because no lookahead is needed
+to determine the next parser action.  Thus, whether default reductions
+are enabled in consistent states (@pxref{Decl
+Summary,,lr.default-reductions}) affects how soon the parser detects a
+syntax error: when it @emph{reaches} an erroneous token or when it
+eventually @emph{needs} that token as a lookahead.  The latter behavior
+is probably more intuitive, so Bison currently provides no way to
+achieve the former behavior while default reductions are fully enabled.
+
+Thus, when @acronym{LAC} is in use, for some fixed decision of whether
+to enable default reductions in consistent states, canonical
address@hidden and @acronym{IELR} behave exactly the same for both
+syntactically acceptable and syntactically unacceptable input.  While
address@hidden still does not support the full language-recognition
+power of canonical @acronym{LR} and @acronym{IELR}, @acronym{LAC} at
+least enables @acronym{LALR}'s syntax error handling to correctly
+reflect @acronym{LALR}'s language-recognition power.
+
+Because @acronym{LAC} requires many parse actions to be performed twice,
+it can have a performance penalty.  However, not all parse actions must
+be performed twice.  Specifically, during a series of default reductions
+in consistent states and shift actions, the parser never has to initiate
+an exploratory parse.  Moreover, the most time-consuming tasks in a
+parse are often the file I/O, the lexical analysis performed by the
+scanner, and the user's semantic actions, but none of these are
+performed during the exploratory parse.  Finally, the base of the
+temporary stack used during an exploratory parse is a pointer into the
+normal parser state stack so that the stack is never physically copied.
+In our experience, the performance penalty of @acronym{LAC} has proven
+insignificant for practical grammars.
+
address@hidden Accepted Values: @code{none}, @code{full}
+
address@hidden Default Value: @code{none}
address@hidden itemize
address@hidden parse.lac
+
 @c ================================================== parse.trace
 @item parse.trace
 @findex %define parse.trace
@@ -11241,6 +11335,14 @@ performs some operation.
 @item Input stream
 A continuous flow of data between devices or programs.
 
address@hidden @acronym{LAC} (Lookahead Correction)
+A parsing mechanism that fixes the problem of delayed syntax error
+detection, which is caused by LR state merging, default reductions, and
+the use of @code{%nonassoc}.  Delayed syntax error detection results in
+unexpected semantic actions, initiation of error recovery in the wrong
+syntactic context, and an incorrect list of expected tokens in a verbose
+syntax error message.  @xref{Decl Summary,,parse.lac}.
+
 @item Language construct
 One of the typical usage schemas of the language.  For example, one of
 the constructs of the C language is the @code{if} statement.
@@ -11397,7 +11499,7 @@ grammatically indivisible.  The piece of text it 
represents is a token.
 @c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead
 @c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th
 @c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps
address@hidden LocalWords: subexpressions declarator nondeferred config libintl 
postfix
address@hidden LocalWords: subexpressions declarator nondeferred config libintl 
postfix LAC
 @c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs
 @c LocalWords: yytokentype filename destructor multicharacter nonnull EBCDIC
 @c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK
-- 
1.7.0.4
[Prev in Thread]
Current Thread
[Next in Thread]
LAC (lookahead correction) for syntax error handling, Joel E. Denny <=
Next by Date: doc: cleanup.
Next by thread: doc: cleanup.
Index(es):
- Date
- Thread