help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: OT: LALR(1)


From: Russell Lewis
Subject: Re: OT: LALR(1)
Date: Wed, 14 Aug 2002 08:14:15 -0700
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020408

Magnus Ekhall wrote:

Hi!

Aslightly off topic short question:
What is the abbreviation LALR(1) really short for?

/Magnus


_______________________________________________
address@hidden http://mail.gnu.org/mailman/listinfo/help-bison

Look Ahead Left Right parser, with (1) token of look ahead.

Now, to explain what that means...somebody sent me a link a few weeks back to an excellent online book (in PDF, IIRC) that explained it all well, but I don't have the link accessible right at the moment.

Basically, my general understanding is that LR parser parses the tokens from left to right. It has to parse a set of tokens, and then immediately replace the sequence of tokens with some other, higher-level token. Then you start over from the left side and re-parse. You continue this process until you have reduced the entire sequence down to a single end-state token. So you get rules like this:

   a => b c
   b => d e
   c => f g

   If you start with this token sequence:
       d e f g
   it gets parsed like this:
       b f g    ( b => d e )
       b c       ( c => f g )
       a          ( a => b c )
   and so you've parsed the sequence.

The problem you get with LR parsers is that you can't make choices based on any tokens that follow AFTER your current token. So a grammar like this can't be parsed by an LR parser:
   a => b c
   a => d e
   b => f g
   c => h i
   d => f g
   e => j k

   Take the example string:
       f g j k
The parser reads the 'f' and 'g' tokens. At that time, it must either reduce them down to a 'b' or down to a 'd', but it can't decide which is right. Thus, an LR parser is stuck.

However, an LALR(1) parser is allowed to look ahead at 1 additional token and make a decision based on that. Thus, it will be able to parse the string above. It looks at the 'j' token and decides that it has to interpret the 'f g' string must be interpreted as 'd', not as 'b'. So it makes the conversion:
   f g j k
   d j k ( d => f g )
   d e    ( e => j k )
   a       ( a => d e )

Now, the trouble with LALR(1) parsers is that they have ONLY one token of lookahead. So while they are a lot more flexible than LR, they still have troubles.

P.S. Bison doesn't actually "start over from the beginning" after reducing a string of tokens into a single token. If you look at the mechanics of how parsing happens (a DFA, beyond the scope of this email), you will find that reparsing will follow the exact pattern you'd followed before, up to a certain point. Thus, bison keeps a stack of states that's it's been in and simply backs up a bit after it does a token conversion.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]