[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFC: custom error messages
From: |
Christian Schoenebeck |
Subject: |
Re: RFC: custom error messages |
Date: |
Fri, 03 Jan 2020 13:08:41 +0100 |
On Freitag, 3. Januar 2020 11:07:05 CET Akim Demaille wrote:
> One severe issue brought to my attention by Rici Lake (unfortunately
> privately, although he had written a very nice and detailed mail with
> all the details) is that this would break several existing parsers
> that expect yytname to be this way. For instance he pointed to
>
> https://git.gnupg.org/cgi-bin/gitweb.cgi?p=libksba.git;a=blob;f=src/asn1-par
> se.y;h=5bff15cd8db64786f7c9e2ef000aeddd583cfdc0;hb=HEAD#l856
> currently not responding, but the code is:
> | for (k = 0; k < YYNTOKENS; k++)
> |
> | {
> |
> | if (yytname[k] && yytname[k][0] == '\"'
> |
> | && !strncmp (yytname[k] + 1, string, len)
> | && yytname[k][len + 1] == '\"' && !yytname[k][len + 2])
> |
> | return yytoknum[k];
> |
> | }
Looks like the use case here is to distinguish non-terminals from terminal
symbols. That could be addressed by introducing some official API function:
bool yy_is_non_terminal(enum yysymbolid id);
and/or:
bool yy_is_terminal(enum yysymbolid id);
Then those double quotes could simply be dropped. Or was there any other use
case for looking at those double quote characters?
> I think he is right, hence the call to yysyntax_error_arguments which
> returns the list of expected/unexpected tokens.
Actuallly I had a general purpose push API in mind. Your suggestion would
limit retrieving the "next expected symbols" solely to error message purposes.
Why not making that a general-purpose function instead that users could call
at any time with the current parser state:
// returns NULL terminated list
const enum yysymbolid* yynextsymbols(const yystate* currentParserState);
Because there are other important use cases that I pointed out to you:
auto completion features; e.g. interactive command line shells where the user
can auto complete the currently incomplete command by hitting tab key, or a
programming language code editor GUI/IDE where the user would get a non-
obtrusive popup while typing for potential code completions. In these use
cases you are not (necessarily) addressing syntax errors. The parser might be
very well in some valid state.
For that purpose, and to continue the idea about a general purpose push API,
it would be very useful to have a function for duplicating the current parser
state:
yystate* yydupstate(const yystate* parserState);
and one function to push parse on a specific parser state:
bool yypushparse(yystate* parserState, char nextchar);
The latter returning false on parser errors. That way people would have a very
flexible and powerful API for all kinds of use cases. Because by being able to
duplicate states, you can have "throw away" parser states, where you can try
out things without touching the "official" parser state. For instance I am
using
that to auto correct user typos in some parsers (that is guessing what user
had in mind on syntax errors by some limited brute force attempts by parser on
throw-away parser states).
But there are many other use cases as well for this: for instance multi-
threaded parsing tasks where each thread would get its own parser state and
each thread e.g. might be working on a different branch of a grammar tree to
reduce latency (overall response time) of a parser system.
> I can't make up my mind on whether returning the list of expected
> tokens as strings (as exemplified above), or simply as their symbol
> numbers. Symbol numbers are more efficient, yet they are the
> *internal* symbol numbers, not the ones the user is exposed to.
I would suggest both. It would make sense to auto generate an enum list for
all symbols like:
enum yysymbolid {
IDENTIFIER,
SWITCH,
IF,
CONST,
...
};
and use that numeric type probably for most Bison APIs for performance
reasons. That type could also be condensed to a smaller type if requested
(i.e. for embedded systems):
enum yysymbolid : uint8_t {
IDENTIFIER,
SWITCH,
IF,
CONST,
...
};
But there should still be a way for people being able to convert that
conveniently to its original string representation from source.y:
const char* yysymbolname(enum yysymbolid);
Happy new 2k20 BTW! ;-)
Best regards,
Christian Schoenebeck
- RFC: custom error messages, Akim Demaille, 2020/01/03
- Re: RFC: custom error messages,
Christian Schoenebeck <=
- Re: RFC: custom error messages, Akim Demaille, 2020/01/05
- Re: RFC: custom error messages, Christian Schoenebeck, 2020/01/09
- Re: RFC: custom error messages, Akim Demaille, 2020/01/10
- Re: RFC: custom error messages, Christian Schoenebeck, 2020/01/14
- Re: RFC: custom error messages, Akim Demaille, 2020/01/14
- [PATCH 00/12] RFC: yyreport_error_message, Akim Demaille, 2020/01/16
- Re: [PATCH 00/12] RFC: yyreport_error_message, Akim Demaille, 2020/01/18
- [PATCH 02/12] yacc.c: store token numbers, not token strings, Akim Demaille, 2020/01/16
- [PATCH 03/12] yacc.c: style: avoid macros, Akim Demaille, 2020/01/16
- [PATCH 01/12] yacc.c: extract yyerror_message_arguments, Akim Demaille, 2020/01/16