bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LR(1) paser generator based on Pager's algorithm


From: Xin Chen
Subject: Re: LR(1) paser generator based on Pager's algorithm
Date: Sun, 01 Apr 2007 23:26:38 -1000

Joel,

> Yes, I consciously eliminated "aaaa" by specifying %left 'a'.
> If I wanted to accept "aaaa", that's correct, but I don't....

You're right. I realized this after I sent out the email. It's like when we 
anounced "%left a", we also automatically denounced the rights to accept the 
string "aaaa". We made one choice and thus gave up another.

> I'm imagining a grammar author who wants to re-use A in several 
> places (in 
> perhaps a much larger grammar than the toy example I gave) but 
> finds it 
> creates a S/R conflict in one place.  He decides that the reduce is 
> always 
> correct at that point in the language, so he adds %left 'a'.  Since 
> grammar authors seem to think naturally in LR(1) not LALR(1), he 
> might not 
> notice that this %left 'a' should have any effect on the part of 
> the 
> language ("baab") that it ends up breaking.

That's right.

> In my mind, the major advantage of switching from LALR(1) to LR(1) 
> would 
> be to make grammar development conceptually easier by eliminating 
> all the 
> mysteries of LALR(1).  If the spread of an existing conflict can 
> still 
> happen without warning, will the developer know to rewrite the 
> grammar or 
> to turn on GLR or whatever?  Is it a good idea to given him false 
> confidence?

In my impression, and as Paul said, the major advantage of LR(1) over LALR(1) 
is the elimination of R/R but not S/R conflicts. To elimnate the mystery part 
of S/R conflicts, we need to rely on heuristics like the precedence rule, in 
this case, more critera in addition to weak compatibility. Note that weak 
compatibility is strictly defined for canonical LR(1) grammars, and does not 
guarantee proper result on non-LR(1) grammars.

> > shift/shift conflict.
> 
> I assume that's a typo.

No this is not a typo. If you read my short report comparison.doc, shift/shift 
conflicts occur when applying the unit production removal algorithm on 
non-LR(1) grammars. That's a typical example of applying algorithms 
specifically designed for LR(1) grammars on non-LR(1) grammars. What this shows 
is that we can't simply rely on such an algorithm in such situation, or at 
least we need additional heuristic.

> > As for Dr. Pager's algorithm, I guess it can be patched like 
> this: in 
> > grammars where precedence rule is used to solve shift/reduce 
> conflict, 
> > combine two states iff they are weakly compatble AND none has a 
> > shift/reduce conflict.
> 
> So, if we decide we can't merge them, how many predecessor states 
> will we 
> have to split in order to successfully separate the merged 
> lookaheads?  

In this case, based on your example, we just don't merge states 5 and 7 since 
state 5 has a S/R conflict resolved by precedence rule (note the criteria is 
not just there is a S/R conflict here, but also that there is a S/R conflict 
resolved using the precedence rule), and the resulted parsing machine is no 
different from the canonical LR(1) parsing machine, and so it can correctly 
recognize the string "baab". And we do not need to consider 
to split any predecessor states at all, since up to now everything was totally 
fine. In my impression, this can intuitively cover all such cases, and this 
problem is already solved. A strict proof may be needed. Then as Knuth said, be 
alerted, since "I have proved it, but did not try it"..

> The beauty of Pager's weak compatibility definition is that it (for 
> LR(1) 
> grammars) foresees whether merging successor states can produce 
> real 
> conflicts by checking the current states for potential conflicts.  
> Unfortunately for my problem, the very mechanism of this foresight 
> is to 
> allow merging when potential conflicts already existed in the 
> current 
> states, and those may be real conflicts for non-LR(1) grammars.

Right. Dr. Pager's algorithm was designed for strict LR(1) grammars. We need 
something additional to solve the problem that you pointed out. I don't know if 
what I proposed can be a real solution, maybe you can think about it and tell 
me.

> Pager's lane-tracing algorithm proposes splitting after 
> constructing LR(0) 
> states, but supposedly that's much slower.

I didn't look at that paper carefully although I do have a copy. It may be 
interesting to actually implement it to see the performance. As I tried the 
original Knuth LR(1) algorithm is running pretty fast, you just need to find 
the proper data structure. But according to page 261 of this "general practical 
method" paper, this general practical method is easier to understand 
conceptually and easier to implement. Besides Dr. Pager's work, the
re is another algorithm by David Spector (proposed around 1981, then refined in 
1988), which is similar to the lane-tracing algorithm in that it splits from 
LR(0) state machine. In his 1988 paper David Spector mentioned about Pager's 
"general practical method", and interestingly stated that "unfortunately it is 
too slow". David Spector's papers can be found in the online digital library of 
ACM.

> > These are just the things came to my mind for now.
> 
> I appreciate your and Paul's comments.
> 

Thanks for your interesting problem.

Xin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]