help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Which lexer do people use?


From: Akim Demaille
Subject: Re: Which lexer do people use?
Date: Mon, 6 Jul 2020 06:32:32 +0200

Hi Adrian,

> Le 4 juil. 2020 à 21:30, Adrian Vogelsgesang <avogelsgesang@tableau.com> a 
> écrit :
> 
> Also, it allowed us to embed a few hacks directly inside the scanner: E.g. in 
> a few places our grammar is not actually LR1. Only in very few edge cases, 
> though, so that we don’t want to use GLR. Hence, our scanner does a lookahead 
> and, e.g., upon encountering the token “WITH” looks at the following token. 
> If the next token is “TIMESTAMP”, it produces “WITH_LA” instead of just 
> “WITH”. Thereby, we get 1 look-ahead from the scanner. Combined with the 1 
> lookahead provided by bison, we can now parse our LR2 grammar.

Bison has the same problem.  It's quite ironic that yacc cannot parse
yacc's "natural" grammar (but then of course, S. Johnson had to write
his parser by hand, so he might not have noticed).

Since the semicolon at the end of rules is optional, it's easy to see
that splitting this into rules requires LR(2):

exp: exp '+' term exp: term
ID : ID  ID  ID   ID : ID

It's the colon _two_ lookaheads farther that ends the current rhs of a
rule.

We actually scan "ID :" as a single token
(https://github.com/akimd/bison/blob/3e6e51cf5c932453ce5614865c5729abac15ec39/src/scan-gram.l#L433).

> Not sure if this would have been possible also with flex – but given we have 
> a hand-rolled parser it was straightforward.

Flex does provide an operator that might help, depending on your case:

'r/s'
     an 'r' but only if it is followed by an 's'.  The text matched by
     's' is included when determining whether this rule is the longest
     match, but is then returned to the input before the action is
     executed.  So the action only sees the text matched by 'r'.  This
     type of pattern is called "trailing context".  (There are some
     combinations of 'r/s' that flex cannot match correctly.  *Note
     Limitations::, regarding dangerous trailing context.)

In Bison addressed this by dealing with start-conditions.

Cheers!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]