bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: custom error messages


From: Christian Schoenebeck
Subject: Re: RFC: custom error messages
Date: Fri, 03 Jan 2020 13:08:41 +0100

On Freitag, 3. Januar 2020 11:07:05 CET Akim Demaille wrote:
> One severe issue brought to my attention by Rici Lake (unfortunately
> privately, although he had written a very nice and detailed mail with
> all the details) is that this would break several existing parsers
> that expect yytname to be this way.  For instance he pointed to
> 
> https://git.gnupg.org/cgi-bin/gitweb.cgi?p=libksba.git;a=blob;f=src/asn1-par
> se.y;h=5bff15cd8db64786f7c9e2ef000aeddd583cfdc0;hb=HEAD#l856
> currently not responding, but the code is:
> | for (k = 0; k < YYNTOKENS; k++)
> | 
> |   {
> |   
> |     if (yytname[k] && yytname[k][0] == '\"'
> |     
> |         && !strncmp (yytname[k] + 1, string, len)
> |         && yytname[k][len + 1] == '\"' && !yytname[k][len + 2])
> |       
> |       return yytoknum[k];
> |   
> |   }

Looks like the use case here is to distinguish non-terminals from terminal 
symbols. That could be addressed by introducing some official API function:

bool yy_is_non_terminal(enum yysymbolid id);

and/or:

bool yy_is_terminal(enum yysymbolid id);

Then those double quotes could simply be dropped. Or was there any other use 
case for looking at those double quote characters?

> I think he is right, hence the call to yysyntax_error_arguments which
> returns the list of expected/unexpected tokens.

Actuallly I had a general purpose push API in mind. Your suggestion would 
limit retrieving the "next expected symbols" solely to error message purposes. 
Why not making that a general-purpose function instead that users could call 
at any time with the current parser state:

// returns NULL terminated list
const enum yysymbolid* yynextsymbols(const yystate* currentParserState);

Because there are other important use cases that I pointed out to you:
auto completion features; e.g. interactive command line shells where the user 
can auto complete the currently incomplete command by hitting tab key, or a 
programming language code editor GUI/IDE where the user would get a non-
obtrusive popup while typing for potential code completions. In these use 
cases you are not (necessarily) addressing syntax errors. The parser might be 
very well in some valid state.

For that purpose, and to continue the idea about a general purpose push API, 
it would be very useful to have a function for duplicating the current parser 
state:

yystate* yydupstate(const yystate* parserState);

and one function to push parse on a specific parser state:

bool yypushparse(yystate* parserState, char nextchar);

The latter returning false on parser errors. That way people would have a very 
flexible and powerful API for all kinds of use cases. Because by being able to 
duplicate states, you can have "throw away" parser states, where you can try 
out things without touching the "official" parser state. For instance I am 
using 
that to auto correct user typos in some parsers (that is guessing what user 
had in mind on syntax errors by some limited brute force attempts by parser on 
throw-away parser states).

But there are many other use cases as well for this: for instance multi-
threaded parsing tasks where each thread would get its own parser state and 
each thread e.g. might be working on a different branch of a grammar tree to 
reduce latency (overall response time) of a parser system.

> I can't make up my mind on whether returning the list of expected
> tokens as strings (as exemplified above), or simply as their symbol
> numbers.  Symbol numbers are more efficient, yet they are the
> *internal* symbol numbers, not the ones the user is exposed to.

I would suggest both. It would make sense to auto generate an enum list for 
all symbols like:

enum yysymbolid {
    IDENTIFIER,
    SWITCH,
    IF,
    CONST,
    ...
};

and use that numeric type probably for most Bison APIs for performance 
reasons. That type could also be condensed to a smaller type if requested 
(i.e. for embedded systems):

enum yysymbolid : uint8_t {
    IDENTIFIER,
    SWITCH,
    IF,
    CONST,
    ...
};

But there should still be a way for people being able to convert that 
conveniently to its original string representation from source.y:

const char* yysymbolname(enum yysymbolid);

Happy new 2k20 BTW!  ;-)

Best regards,
Christian Schoenebeck





reply via email to

[Prev in Thread] Current Thread [Next in Thread]