Re: Conflict counterexamples

bison-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Conflict counterexamples

From:	Adrian Vogelsgesang
Subject:	Re: Conflict counterexamples
Date:	Sat, 6 Jun 2020 08:45:16 +0000
User-agent:	Microsoft-MacOutlook/10.10.16.200509
Hi Akim,

Thanks for your detailed replies!

> To me, expecting counterexamples from the CI is comparable to using the CI to 
> see if a C program compiles.
I guess I should have phrased this more concisely. I am not planning to abuse 
our CI to produce counter-examples.
However, the build scripts we are using for the CI and locally are the same.
On my local machine, I have to type “./bzl.py” to build our program, and it is 
essentially running the same steps as on the CI.
Now, if I want to get additional information locally, I quickly modify our 
Bazel config, to include an additional flag.
To make matters worse, our builds in Bazel are sanboxed/dockerized, also on the 
local machines, and it’s a pain to get any file out of those containers.
I could certainly adjust the files to get a file out of It  or a flag “Produce 
bison output file”, but it’s more complicated than just temporarily adding a 
“-Wcounterexample” to our build files.

> Besides, the HTML output […] Something the terminal cannot give you.
Wow, I totally missed the HTML output so far. I guess I will use this from now 
on :)

> conflicts and counterexamples are on a completely different level: they are 
> related to the automaton, its states and transitions. They don't have a 
> location in the source file.
Hm… maybe I misunderstood that then… In my mind, counterexamples also had an 
associated location. To, a conflict has a “conflict root”, i.e. the common rule 
from which on the derivations start to differ (not sure if there is a better, 
established term for that).

E.g., for the if-else example:
The “exp”-rule is probably part of a larger grammar and “exp” itself is not the 
grammar’s entry point. Instead, “program” is the grammar’s entry point. Still, 
the conflict is local to the “exp”, and hence I would mentally attach it to the 
“exp” node and not to the overall “program”. Yes, “program” is ambiguous, but 
that’s the case because “exp” is ambiguous, and hence I would attribute the 
error to “exp” and not at the “program”.
I would report the error at the line in which “exp” was defined.

> 2. One way to draw the derivations
> https://www.lrde.epita.fr/~akim/private/bison/deriv.pdf
> 3. Another one
> https://www.lrde.epita.fr/~akim/private/bison/deriv-2.pdf

Those renderings are really useful. Of all the variants (including the 
single-line color-coded variant which was so far my favorite), those trees are 
now my new favorite.
However, I don’t think they can be rendered in a good way as an ASCII art. As 
such, I would keep the “single-line color-coded” flavour for the console, and 
use the GraphViz variant e.g. in the HTML report.

>  I'll experiment with coloring the derivations too, with the same colors as 
> the corresponding example.
I guess this has potential.
I quickly went ahead and sketched what I currently have in my mind. You can 
find my approach attached to this email (maybe, if the mailing list is friendly 
to me) or alternatively on 
https://github.com/vogelsgesang/bison/blob/cd42b5e0146d25db17f213e74b12b0319256f43f/deriv_colored.svg
Not sure if the coloring I added would actually work in GraphViz, though…

>> In your example, it took me some time to actually spot them. I am not sure 
>> what all those other lines actually tell me, I would prefer to reduce the 
>> verbosity a bit.
> First legibility, then conciseness.
Now that I see the colors: agreed.
Without the colors, the line
> "if"  exp  "then"  "if"  exp  "then"  exp  •  "else"  exp
Was literally identical and appeared twice. Removing this seemingly duplicated 
lines is what I meant by “reducing verbosity”.
With the colors, it totally makes sense to repeat those lines.

>> I am currently wondering, if we might want to simplify this use case, too, 
>> and add hints like "to choose derivation one, add %prec ..." to the error 
>> message
>Wow... That's super ambitious. You'd have to foretell the impact of your 
>directive, which basically boils down to running the automaton construction 
>again.
>We can add recipes in the documentation though.
Yes, I see how this is probably pretty hard to do and not worth the effort. 
Just something I wanted to put out there, in the hope that maybe someone else 
has a good idea on how that would be more easily implementable…

Cheers,
Adrian


From: Akim Demaille <akim.demaille@gmail.com>
Date: Saturday, 6 June 2020 at 08:47
To: Adrian Vogelsgesang <avogelsgesang@tableau.com>
Cc: Bison Patches <bison-patches@gnu.org>, Bison Maintainers 
<Bison-Maintainers@gnu.org>, Vincent Imbimbo <vmi6@cornell.edu>
Subject: Re: Conflict counterexamples

Hi Adrian,

Thanks a lot for the feedback!

> Le 5 juin 2020 à 16:55, Adrian Vogelsgesang <avogelsgesang@tableau.com> a 
> écrit :
>
> Hi Akim, hi Vincent
>
>> 1. Where
> I would definitely prefer to have it on the command line instead of the 
> .output file.
>
> First, our CI and our build scripts at the project I am working on don't keep 
> the .output file around, but they report stderr/stdout. We could of course 
> adjust our build scripts, but stderr would make my life easier.

I don't see the role played by the CI here. Counterexample generation is 
useless unless you are working on the grammar itself. The CI should not play 
any role here, rather it should monitor %expect etc. to make sure no-one 
creates regressions. To me, expecting counterexamples from the CI is comparable 
to using the CI to see if a C program compiles.

Studying the counterexamples is something that should happen when you are 
preparing your commit, not when you push it.


> Furthermore for myself when debugging a grammar having to open the .output 
> file is always an extra step. I would prefer not to have to do that extra 
> step.

I understand that, but the output file contains very valuable information that 
the command line cannot give you. Besides, the HTML output (e.g., 
https://www.lrde.epita.fr/~akim/private/bison/parse-gram.html<https://www.lrde.epita.fr/~akim/private/bison/parse-gram.html>)
 makes it way easier to navigate through states, rules and symbols, thanks to 
hyperlinks. Something the terminal cannot give you.

> And last, but not least: Afaict, bison prints errors in a gcc-compatible 
> format. Many IDEs support directly highlighting the parts of your code which 
> correspond to a warning produced by gcc. It would be amazing if I could see 
> the errors, including counterexamples directly in vim or VisualStudio Code - 
> however I never tried and I am not sure if it would actually work

Where I truly value the diagnostics on stderr, with compatibility with GCC 
etc., is for things which are related to the specific parts of the input file. 
On this regards, conflicts and counterexamples are on a completely different 
level: they are related to the automaton, its states and transitions. They 
don't have a location in the source file.

That's exactly where the reports (*.output or *.html) shine: they show the 
automaton, something the grammar cannot give you.

I have not practiced -Wcounterexample in real life while working on a grammar, 
so I can certainly be biased; but so far, the reports are the single most 
valuable source of information to understand your parser, and debug/tune it. 
That's why I would expect anyone willing to debug her parser to look at the 
reports, and to expect the counterexamples to be there.

In my running example:

$ cat /tmp/foo.y
%%
exp
: "if" exp "then" exp
| "if" exp "then" exp "else" exp
| "num"

since the S/R conflict is in state 7, I would like the report to include facts 
about the conflict, e.g.:

State 7

1 exp: "if" exp "then" exp . [$end, "then", "else"]
2 | "if" exp "then" exp . "else" exp

"else" shift, and go to state 8

"else" [reduce using rule 1 (exp)]
$default reduce using rule 1 (exp)

Shift-reduce conflict on symbol "else"
Example "if" exp "then" "if" exp "then" exp • "else" exp
First derivation exp ::=[ "if" exp "then" exp ::=[ "if" exp "then" exp • ] 
"else" exp ]
Second derivation exp ::=[ "if" exp "then" exp ::=[ "if" exp "then" exp • 
"else" exp ] ]



>> 3. How
> Preferably in a format such that vim/VisualStudio Code can pick it up and 
> render it next to the corresponding lines.

I don't see what correspondance you mean. Conflicts don't have a source 
location.

It could make sense with one of my proposals showing one derivation at a time, 
then we could relate to the source lines of each of the rules that participates.

It would also make sense to show the path through the automaton states in the 
report files.

> Among the two styles which you shared, I personally prefer the first one,
> i.e.
> [ "if" exp "then" exp ::=[ "if" exp "then" exp • ] "else" exp ]
> vs.
> ["if" exp "then" exp ::=[ "if" exp "then" exp • "else" exp ] ]
> since that makes the ambiguity more obvious to me.

This is super frustrating... The mailing list removed everything from my 
message:
- it removed the colors
- it removed all the attachments

So, sure, I understand what you mean, since you couldn't actually see what I 
meant :(

See

1. The diagnostics with colors
https://www.lrde.epita.fr/~akim/private/bison/bison-cex-color.png<https://www.lrde.epita.fr/~akim/private/bison/bison-cex-color.png>

2. One way to draw the derivations
https://www.lrde.epita.fr/~akim/private/bison/deriv.pdf<https://www.lrde.epita.fr/~akim/private/bison/deriv.pdf>

3. Another one
https://www.lrde.epita.fr/~akim/private/bison/deriv-2.pdf<https://www.lrde.epita.fr/~akim/private/bison/deriv-2.pdf>

These graphs can certainly be rendered more nicely if we spend time tuning 
graphviz. We can look for means to embed them in the HTML output (well, very 
not straightforward, but doable). Colors might be useful.

As another example, here's the case of the AWK grammar with colored examples 
(different rendering from the toy example):

https://www.lrde.epita.fr/~akim/private/bison/awk.html<https://www.lrde.epita.fr/~akim/private/bison/awk.html>

I'll experiment with coloring the derivations too, with the same colors as the 
corresponding example.

> Those are the most valuable two lines to me and they should be right next to 
> each other. By comparing those two lines, the ambiguity becomes obvious.

In simple cases it is straightforward, agreed. But have a look at the AWK case: 
in more complex cases, the rewritings are too much visual noise to my eyes, and 
I almost need to draw the derivation tree to see what actually happens.


> In your example, it took me some time to actually spot them. I am not sure 
> what all those other lines actually tell me, I would prefer to reduce the 
> verbosity a bit.

First legibility, then conciseness.

> To make it even simpler to spot the ambiguities, one could align the two 
> derivation under each over horizontally, such that the different position of 
> the "]" becomes more obvious.

I'm not sure I see what you mean here. The derivations may go through very 
different symbols, so except for the root, I don't see what could be aligned. 
Aligning with the symbols of the counterexample makes a lot of sense though. 
But we might go through intermediate symbols with long names which would 
require a lot of horizontal space.


> The 2nd example makes it easier to understand the "state of the parse stack" 
> than the first example.
> E.g., it was easier to me to understand which of the derivations corresponds 
> to a reduce ("finish this level and go back up"), and which one corresponds 
> to a shift ("stay on the same level").
> The ambiguity on the other hand is less obvious since it only becomes 
> apparent after mentally combining multiple lines.

I had the opposite problem :) I needed to unfold the lines to see the trees. 
Vincent's rendering of derivation trees is new to me, I never used/saw that 
before. I'm used to the classical derivation trees, or leftmost derivation 
sequences. And I believe that's the common knowledge. Vincent's approach is 
super nice to fit all the needed data in a one liner, but the mixture is hard 
to read is rich cases.


> Usually I care more about the "what is ambiguous" than about "is it a reduce 
> or a shift".
> That's because to fix my grammar I have to understand "why is it ambiguous?" 
> first.

Actually we're not just talking about ambiguities, but of conflicts.

> Only after that I can consider my options ("introduce a new keyword", 
> "restructure my language", "use %prec to resolve the conflict", "change 
> operator precedence", ...).
> Personally, I usually settle for one of the first two options, i.e. changing 
> the language. And for those options I never actually care "which one is the 
> shift, which one is the reduce".

Sorry, I don't see what you mean. I have shown other ways to display the 
derivations, and nothing about the LR automaton. None of my proposals were 
about shifts and reduces, so I don't understand what you are referring to.

> The distinction "which one is the reduce, which one is the shift" is really 
> only necessary if I want to resolve the conflict by annotating my grammar 
> with corresponding precedence rules.

Or massaging the grammar.

> I am currently wondering, if we might want to simplify this use case, too, 
> and add hints like "to choose derivation one, add %prec ..." to the error 
> message

Wow... That's super ambitious. You'd have to foretell the impact of your 
directive, which basically boils down to running the automaton construction 
again.

We can add recipes in the documentation though.

Cheers!
deriv_colored.svg
Description: deriv_colored.svg
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Conflict counterexamples, Adrian Vogelsgesang, 2020/06/05
- Re: Conflict counterexamples, Akim Demaille, 2020/06/06
  - Re: Conflict counterexamples, Adrian Vogelsgesang <=
    - Re: Conflict counterexamples, Akim Demaille, 2020/06/07
Prev by Date: Re: Conflict counterexamples
Next by Date: all: show the rules in user actions
Previous by thread: Re: Conflict counterexamples
Next by thread: Re: Conflict counterexamples
Index(es):
- Date
- Thread