[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFC: custom error messages
From: |
Akim Demaille |
Subject: |
Re: RFC: custom error messages |
Date: |
Sun, 5 Jan 2020 17:52:43 +0100 |
Hi Christian,
Sorry I missed you message. For some reason the title of the thread
was broken in the other answers.
> Le 3 janv. 2020 à 13:08, Christian Schoenebeck <address@hidden> a écrit :
>
> On Freitag, 3. Januar 2020 11:07:05 CET Akim Demaille wrote:
>> One severe issue brought to my attention by Rici Lake (unfortunately
>> privately, although he had written a very nice and detailed mail with
>> all the details) is that this would break several existing parsers
>> that expect yytname to be this way. For instance he pointed to
>>
>> https://git.gnupg.org/cgi-bin/gitweb.cgi?p=libksba.git;a=blob;f=src/asn1-par
>> se.y;h=5bff15cd8db64786f7c9e2ef000aeddd583cfdc0;hb=HEAD#l856
>> currently not responding, but the code is:
>> | for (k = 0; k < YYNTOKENS; k++)
>> |
>> | {
>> |
>> | if (yytname[k] && yytname[k][0] == '\"'
>> |
>> | && !strncmp (yytname[k] + 1, string, len)
>> | && yytname[k][len + 1] == '\"' && !yytname[k][len + 2])
>> |
>> | return yytoknum[k];
>> |
>> | }
>
> Looks like the use case here is to distinguish non-terminals from terminal
> symbols. That could be addressed by introducing some official API function:
>
> bool yy_is_non_terminal(enum yysymbolid id);
>
> and/or:
>
> bool yy_is_terminal(enum yysymbolid id);
Not exactly. The test here is to tell the difference between
string aliases ("break" represented as "\"break\"") and plain symbols
(TOK_BREAK, represented as "TOK_BREAK"). The difference bw terminal
and non terminals is handled by the loop itself: starting at YYNTOKENS,
it's only nonterminals.
Anyway, as I mentioned I don't want to support this. And I will not
make it easier.
> Then those double quotes could simply be dropped. Or was there any other use
> case for looking at those double quote characters?
I definitely want to get rid of these quotes! But not with 'verbose'
error messages, only with 'custom' and 'rich'.
>> I think he is right, hence the call to yysyntax_error_arguments which
>> returns the list of expected/unexpected tokens.
>
> Actuallly I had a general purpose push API in mind. Your suggestion would
> limit retrieving the "next expected symbols" solely to error message
> purposes.
yes, I'm focusing on improving the error messages, which is probably
the most common request these last years.
> Why not making that a general-purpose function instead that users could call
> at any time with the current parser state:
>
> // returns NULL terminated list
> const enum yysymbolid* yynextsymbols(const yystate* currentParserState);
I don't want to have to deal with allocating space. Your proposal
needs to allocate space. Hence the clumsy interface I provided :)
> Because there are other important use cases that I pointed out to you:
> auto completion features; e.g. interactive command line shells where the user
> can auto complete the currently incomplete command by hitting tab key, or a
> programming language code editor GUI/IDE where the user would get a non-
> obtrusive popup while typing for potential code completions. In these use
> cases you are not (necessarily) addressing syntax errors. The parser might be
> very well in some valid state.
I see your point.
> For that purpose, and to continue the idea about a general purpose push API,
> it would be very useful to have a function for duplicating the current parser
> state:
>
> yystate* yydupstate(const yystate* parserState);
Wow, you're talking about massive surgery in yacc.c. Roughly,
stop using local variables for the stacks. Which is what the
push-interface does (I'm talking about api.push here).
Or are you referring to push-parsers when you say "push API"?
> and one function to push parse on a specific parser state:
>
> bool yypushparse(yystate* parserState, char nextchar);
>
> The latter returning false on parser errors. That way people would have a
> very
> flexible and powerful API for all kinds of use cases. Because by being able
> to
> duplicate states, you can have "throw away" parser states, where you can try
> out things without touching the "official" parser state. For instance I am
> using
> that to auto correct user typos in some parsers (that is guessing what user
> had in mind on syntax errors by some limited brute force attempts by parser
> on
> throw-away parser states).
That might be doable with api.push. I don't see that coming for
the pull interface.
> But there are many other use cases as well for this: for instance multi-
> threaded parsing tasks where each thread would get its own parser state and
> each thread e.g. might be working on a different branch of a grammar tree to
> reduce latency (overall response time) of a parser system.
Again, that's the kind of things for api.pure, not the regular
yacc.c.
>> I can't make up my mind on whether returning the list of expected
>> tokens as strings (as exemplified above), or simply as their symbol
>> numbers. Symbol numbers are more efficient, yet they are the
>> *internal* symbol numbers, not the ones the user is exposed to.
>
> I would suggest both. It would make sense to auto generate an enum list for
> all symbols like:
>
> enum yysymbolid {
> IDENTIFIER,
> SWITCH,
> IF,
> CONST,
> ...
> };
> and use that numeric type probably for most Bison APIs for performance
> reasons. That type could also be condensed to a smaller type if requested
> (i.e. for embedded systems):
>
> enum yysymbolid : uint8_t {
> IDENTIFIER,
> SWITCH,
> IF,
> CONST,
> ...
> };
>
> But there should still be a way for people being able to convert that
> conveniently to its original string representation from source.y:
>
> const char* yysymbolname(enum yysymbolid);
Yes, of course. That's not "both", that's just what I refer
to by "exposing the numbers". "yysymbolname(x)" is currently
just "yytname[x]".
> Happy new 2k20 BTW! ;-)
Thanks! Best wishes to you!
- RFC: custom error messages, Akim Demaille, 2020/01/03
- Re: RFC: custom error messages, Christian Schoenebeck, 2020/01/03
- Re: RFC: custom error messages,
Akim Demaille <=
- Re: RFC: custom error messages, Christian Schoenebeck, 2020/01/09
- Re: RFC: custom error messages, Akim Demaille, 2020/01/10
- Re: RFC: custom error messages, Christian Schoenebeck, 2020/01/14
- Re: RFC: custom error messages, Akim Demaille, 2020/01/14
- [PATCH 00/12] RFC: yyreport_error_message, Akim Demaille, 2020/01/16
- Re: [PATCH 00/12] RFC: yyreport_error_message, Akim Demaille, 2020/01/18
- [PATCH 02/12] yacc.c: store token numbers, not token strings, Akim Demaille, 2020/01/16
- [PATCH 03/12] yacc.c: style: avoid macros, Akim Demaille, 2020/01/16
- [PATCH 01/12] yacc.c: extract yyerror_message_arguments, Akim Demaille, 2020/01/16
- [PATCH 04/12] yacc.c: add custom error message generation, Akim Demaille, 2020/01/16