[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RFC: a name for the error token
From: |
Akim Demaille |
Subject: |
RFC: a name for the error token |
Date: |
Sun, 26 Apr 2020 18:40:53 +0200 |
Hi,
We currently have several ways for the scanner to report an error to the
scanner:
1. return the undefined token (YYUNDEF)
2. return an unknown token kind
3. return the error token
1 and 2 are basically indistinguishable: any token kind which is not known is
mapped to the YYSYMBOL_YYUNDEF symbol kind by YYTRANSLATE. The only difference
is if you are using api.token.raw, in which case the token kind and the symbol
kind coincide, and YYTRANSLATE is the identity. In that case it is no longer
valid to return invalid tokens (undefined behavior), you must return
YYSYMBOL_YYUNDEF (aka YYUNDEF). They have the parser emit an error message,
and then enter error-recovery.
3. Until recently the error token used to behave like YYUNDEF, but with my
recent changes (https://lists.gnu.org/r/bison-patches/2020-04/msg00145.html) it
no longer emits an error message.
There is one problem left: having a name for the error token. Currently it's
"YYERRCODE", but it is an ugly name. Since it was never documented (and in 3.6
it will be documented), we have an opportunity to find a good way to name it.
Actually, because some people have used in the past and expected an error
message, we should have a backward compatibility macros that point YYERRCODE to
YYUNDEF.
So, what name for the error token?
a. There's one quite obvious name: YYERROR. Unfortunately it collides with the
YYERROR macro. We can play #define tricks around user actions to have it be
YYERROR only there, but it feels not so good.
b. We can use a name such as YYERROR_TOKEN, but I don't like that much, as it's
a completely different naming scheme compared to the other tokens (user tokens
such as NUM, or special tokens such as YYEOF). Besides, it would make a
difference with the name of the symbol kind (YYSYMBOL_YYERROR) unless we also
make it YYSYMBOL_YYERROR_TOKEN. Which is erk...
c. In the grammar, the error token is spelled "error", so it would make a lot
of sense to just name it "error" and "YYSYMBOL_error", but we are infringing
the user "name space".
d. So it could be simply "YYerror", which does show it's a built-in symbol (as
YYEOF and YYUNDEF), yet it does not follow the convention of uppercase for
tokens. Its symbol would be YYSYMBOL_YYerror of course.
I have been thinking about this issue for weeks, and the more I think about it,
the more I believe (d) is the least ugly approach.
But maybe someone would have a better option?
- RFC: a name for the error token,
Akim Demaille <=