help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Lexical feedback and lookahead


From: Tim Van Holder
Subject: Re: Lexical feedback and lookahead
Date: Tue, 19 Jul 2005 15:43:31 +0200
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)

Evan Lavelle wrote:
> I've got a problem where I need some communication back from Bison to
> Flex (in this case, I need Flex to return two different tokens for the
> same input, depending on context).
> 
> The procedure is something like:
> 
> 1 - parser determines context and sets global flag for lexer
> 
> 2 - lexer checks flag, reads a name, and returns a token depending on
> the flag
> 
> In general, this doesn't work because of Bison's lookahead - as often as
> not, the lexer has already returned a token before Bison has set the flag.
> 
> Is there a general solution to this problem? 'yyclearin' is no good; it
> just over-writes 'yychar'. YYBACKUP doesn't help, either, because you
> can't use it if there's already something in yychar. Ideally, I need to
> check if there's anything in yychar, and then push it back somehow.
> 
> Any ideas?

The trick tends to be to set the flag 'early':

Given this case: in COBOL there are intrinsic functions, whose names are
only reserved when preceded by the reserved word FUNCTION.  So we want
the lexer to return those names as regular identifiers normally, but
as specific keyword tokens when applicable.

So what you would be inclined to write is a grammar rule like:

xxx
: FUNCTION { recognize_function_names = true; } valid_function
;

valid_function : FOO | BAR ;

(whereby the lexer clears the flag itself after returning a function
name).
As you noticed, this does not work because of lookahead.
But, also because of lookahead, this DOES work:

xxx
: { recognize_function_names = true; } FUNCTION valid_function
;

In this case, bison comes here because it has already seen the FUNCTION
keyword, which is exactly the moment when you want to tell the lexer
to recognize the next input differently.

Or if you have some code in the lexer that does identifier/keyword
recognition based on table lookups, perhaps this is closer:

xxx
: { register_function_names(); }
  FUNCTION
  { unregister_function_names(); }
  valid_function
;

It looks "wrong" but it works.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]