[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH] doc: cex documentation
From: |
Vincent Imbimbo |
Subject: |
[PATCH] doc: cex documentation |
Date: |
Sun, 28 Jun 2020 15:30:17 -0400 |
Hey Akim,
I think I found a good place for the cex documentation in bison.texi, but it
might still be a bit awkward. Tell me if you have any ideas for changes.
*NEWS, doc/bison.texi: Add documentation for conflict counterexample generation
---
NEWS | 24 +++++++++++++++++
doc/bison.texi | 71 +++++++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 91 insertions(+), 4 deletions(-)
diff --git a/NEWS b/NEWS
index 60587f7f..6c5564a7 100644
--- a/NEWS
+++ b/NEWS
@@ -12,6 +12,30 @@ GNU Bison NEWS
similar to the `-ffile-prefix-map` in GCC. This option can be used to
make bison output reproducible.
+*** Counterexample Generation
+
+ When given --report=counterexample or -Wcounterexample, Bison will now output
+ counterexamples for conflicts in the grammar. These are strings in the
grammar
+ which can be parsed in two ways due to the conflict. For example:
+
+ Example exp '+' exp • '/' exp
+ First derivation exp ::=[ exp ::=[ exp '+' exp • ] '/' exp ]
+ Example exp '+' exp • '/' exp
+ Second derivation exp ::=[ exp '+' exp ::=[ exp • '/' exp ] ]
+
+ This is a shift/reduce conflict caused by none of the operators having
+ precedence, so the example can be parsed in the two ways shown.
+ When bison cannot find an example that can be derived in two ways,
+ it instead generates two examples that are the same up until the dot:
+
+ First example expr • ID $end
+ First derivation $accept ::=[ s ::=[ a ::=[ expr • ] ID ] $end ]
+ Second example expr • ID ',' ID $end
+ Second derivation $accept ::=[ s ::=[ a ::=[ expr ::=[ expr • ID ',' ]
] ID ] $end ]
+
+ In these cases, the parser usually doesn't have enough lookahead to
+ differentiate the two given examples.
+
** Changes
*** Relocatable installation
diff --git a/doc/bison.texi b/doc/bison.texi
index 678b6efe..9dd60851 100644
--- a/doc/bison.texi
+++ b/doc/bison.texi
@@ -9793,12 +9793,67 @@ calc.y:19.1-7: @dwarning{warning}: nonterminal useless
in grammar: useless [@dwa
19 | @dwarning{useless: STR;}
| @dwarning{^~~~~~~}
calc.y: @dwarning{warning}: 7 shift/reduce conflicts
[@dwarning{-Wconflicts-sr}]
+calc.y: @dwarning{warning}: rerun with option '-Wcounterexamples' to generate
conflict counterexamples [@dwarning{-Wother}]
@end example
-When given @option{--report=state}, in addition to @file{calc.tab.c}, it
-creates a file @file{calc.output} with contents detailed below. The
-order of the output and the exact presentation might vary, but the
-interpretation is the same.
+When given @option{-Wcounterexample}, bison will run a search for strings in
your grammar
+that better demonstrate you conflicts. Counterexample generation was initially
developed
+by Chinawat Isradisaikul and Andrew Myers (@pxref{Bibliography}). For
@file{calc.y},
+the first printed example is:
+
+@example
+Shift/reduce conflict on token '/':
+ Example exp '+' exp • '/' exp
+ First derivation exp ::=[ exp ::=[ exp '+' exp • ] '/' exp ]
+ Example exp '+' exp • '/' exp
+ Second derivation exp ::=[ exp '+' exp ::=[ exp • '/' exp ] ]
+@end example
+
+This shows two separate derivations in the grammar for the same exp: @samp{e1
+ e2 / e3}.
+The derivations show how your rules would parse the given example. Here,
+the first derivation completes a reduction when seeing @samp{/}, causing
@samp{e1 + e2}
+to be grouped as an @code{exp}. The second derivation shifts on @samp{/},
resulting in
+@samp{e2 / e3} being grouped as an @code{exp}. Therefore, it is easy to see
that adding
+@code{%precedence} directives would fix this conflict.
+
+Sometimes, the search will not find an example that can be derived in two ways.
+In these cases, counterexample generation will provide two examples that are
+the same up until the dot. Most notably, this will happen when your grammar
+requires a stronger parser (more lookahead, lr instead of lalr).
+The following example isn't LR(1):
+
+@example
+%token ID
+%%
+s: a ID
+a: expr
+expr: %empty | expr ID ','
+@end example
+
+@command{bison} reports:
+
+@example
+Shift/reduce conflict on token ID:
+ First example expr • ID $end
+ First derivation $accept ::=[ s ::=[ a ::=[ expr • ] ID ] $end ]
+ Second example expr • ID ',' ID $end
+ Second derivation $accept ::=[ s ::=[ a ::=[ expr ::=[ expr • ID ',' ] ]
ID ] $end ]
+@end example
+
+This conflict is caused by the parser not having enough information
+to know the difference between these two examples. The parser would need
+an additional lookahead token to know whether or not a comma follows
+the @samp{ID} after @code{expr}. These types of conflicts tend to be more
difficult to
+fix, and usually need a rework of the grammar. In this case, it can be
+fixed by changing around the recursion:
+@code{expr: ID | ',' expr ID}
+
+Counterexamples can also be written to a file with @option{--report=cex}
+
+Going back to the calc example, when given @option{--report=state},
+in addition to @file{calc.tab.c}, it creates a file @file{calc.output}
+with contents detailed below. The order of the output and the exact
+presentation might vary, but the interpretation is the same.
@noindent
@cindex token, useless
@@ -15266,6 +15321,14 @@ Elizabeth Scott, Adrian Johnstone, and Shamsa Sadaf
Hussain,
@cite{Tomita-Style Generalised LR Parsers}, Royal Holloway, University of
London, Department of Computer Science, TR-00-12 (December 2000).
@uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps}
+
+@item [Isradisaikul, Myers 2015]
+Chinawat Isradisaikul, Andrew Myers,
+Finding Counterexamples from Parsing Conflicts,
+in @cite{Proceedings of the 36th ACM SIGPLAN Conference on
+Programming Language Design and Implementation} (PLDI ’15),
+ACM, pp.@: 555--564.
+@uref{https://www.cs.cornell.edu/andru/papers/cupex/cupex.pdf}
@end table
@node Index of Terms
--
2.24.3 (Apple Git-128)
- [PATCH] doc: cex documentation,
Vincent Imbimbo <=