bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: custom error messages


From: Rici Lake
Subject: Re: RFC: custom error messages
Date: Mon, 6 Jan 2020 13:23:27 -0500

Hi, Akim.

Sorry for the slow response, and thanks for circulating this comprehensive
proposal. I certainly have no argument with any of it, but I wanted to
share my particular perspective on token aliases (i.e., quoted strings). I
think I alluded to this in my email from last year, or whenever it was that
you initially raised this issue.

For as long as I have been aware, the `"literal"` syntax has been described
in the bison manual as a "literal string token"
<http://dinosaur.compilertools.net/bison/bison_6.html#SEC40> and the
motivating example then and now is, in fact, a literal string (`"<="`).
Like character tokens, literal string tokens provide a way to make the
grammar reflect the actual tokens used in the language being parsed. It is
similar to the notation used in ISO EBNF (although ISO EBNF does not
distinguish between single- and double-quoted forms).

Unfortunately, no mechanism was ever proposed which allowed for the
automated extraction of these keyword literals to be used in an
automatically-generated scanner description. But various ad hoc mechanisms
have been developed for particular projects, and some of them are still
extent.

At some point in the development of bison, it seems to have occurred to the
team that literal string tokens could also be used to give human-readable
and translatable names to non-keyword tokens (`"identifier"`). This
secondary use is obviously (it seems to me) incompatible with the first
use, because the alias definition does not provide any possibility to
differentiate between translatable descriptions and keyword tokens; any
such distinction is the responsibility of the grammar implementor. Without
such a mechanism, neither automated scanner generation nor automated
translation are really possible. (It's possible to fake automated
translation by added identity translations of all keywords but that is not
really satisfactory since a keyword might also be a translatable
non-keyword term in another part of the program.)

Now, it appears that the secondary usage has become the primary usage.
That's a natural progression, of course, and it may well reflect the real
preference of the user community as a whole and not just the segment
reflected here. Of course, it is important to improve error message
readability and translatability; all of us should agree about that. But
automated scanner generation is also a useful facility, particularly in an
educational context, and it would surely be better if both goals could be
accommodated.

So I think that there is still time to consider the wider question of how a
bison grammar might be able to show both literal keywords and
human-readable token descriptions in a way that is useful for both
applications. As a side-benefit, this might also make grammars easier to
read by humans, because the current mechanism does not make it clear to a
human reader in all cases whether a quoted string is intended to be
inserted literally into the parsed text, or whether it is necessary to hunt
through the grammar's token declarations to find the named token for which
the quoted string is an alias. (I've been fooled by this several times
while trying to read grammars, particularly when only an extract of the
grammar is presented.)

I don't really have a proposal here. Most of the automated scanner
generators I've written assume that a quoted string is a literal unless it
has at least one whitespace character, but that's not very satisfactory.
I've also attempted to parse the bison grammar description to find the
tokens which are explicitly declared, using the convention that only
implicitly declared tokens should be used to automatically generate scanner
patterns; that's more work and possibly more precise, but it doesn't work
well with existing bison grammar files in which all tokens are declared in
order to give token names to manually-generated scanner descriptions.

I have to say that my experience with automatically-generated scanner
files, despite all of the above issues, has been pretty uniformly positive.
It makes it much easier to prototype and test grammar fragments. (It's
possible that I have a particular bias in this use case.)

As a final note, once automated scanner generation is on the table, it
could also become useful to provide some way of telling the scanner
generator what scanner pattern expressions should be used for non-keyword
tokens. (One way of doing this with (f)lex-generated scanners is to use a
(f)lex macro as the scanner pattern. That can be simply integrated in the
bison literal string, by using, for example, `{identifier}` as the token
alias, which can then be translated for human consumption. That works in
many simple applications, but I find myself increasingly hesitant to
propose hacks as solutions.)

That was, I think, a very long non-response to your RFC. But I hope it was
in some way useful. Let me add one more consideration, which is possibly
more relevant, about whether quotes should or should not be included in
name tables.

I think that the uses of the (or a) token name table which appear in the
RFC assume that the token names will be used at runtime. In such use cases,
the quotes are certainly an inconvenience, and removing them is an
unnecessarily annoying project.

But another use case for the/a token name table is at code-generation time
(whether the code being generated is a scanner or a literal error message,
or something else). In this use case, it is extremely useful to have the
quoted form, which is a valid C string literal whose value will be the
token name when compiled. The quoting functions in bison cheerfully handle
a number of corner cases which are awkward to implement, and which are
often not even though about (such as tokens which happen to include
trigrams). With unquoted names, the code generation has to duplicate all
this code.

That's not necessarily an argument for automatic quotation of names, as in
the current name table. For one thing, it is entirely possible that the
code generation context is not C code (although other contexts, like (f)lex
patterns, are at least mostly compatible). A middle road approach might be
to separately bundle and document the internal quotation library.

Finally, thanks a lot for you many valuable contributions, and my best
wishes for the new year.
Rici.

El vie., 3 ene. 2020 a las 5:07, Akim Demaille (<address@hidden>)
escribió:

> Hi all,
>
> There is quite some demand to provide the users with more freedom
> to customize the error messages.  There's also demand for us to stop
> double-quoting the token names in the table of the symbol names.  For
> instance in Bison's own parser we have
>
> | static const char *const yytname[] =
> | {
> |   "\"end of file\"", "error", "$undefined", "\"string\"", "\"%token\"",
> |   "\"%nterm\"", "\"%type\"", "\"%destructor\"", "\"%printer\"",
> |   "\"%left\"", "\"%right\"", "\"%nonassoc\"", "\"%precedence\"",
> |   "\"%prec\"", "\"%dprec\"", "\"%merge\"", "\"%code\"",
> |   "\"%default-prec\"", "\"%define\"", "\"%defines\"",
> "\"%error-verbose\"",
> |   "\"%expect\"", "\"%expect-rr\"", "\"%<flag>\"", "\"%file-prefix\"",
> |   "\"%glr-parser\"", "\"%initial-action\"", "\"%language\"",
> |   "\"%name-prefix\"", "\"%no-default-prec\"", "\"%no-lines\"",
> |   "\"%nondeterministic-parser\"", "\"%output\"", "\"%pure-parser\"",
> |   "\"%require\"", "\"%skeleton\"", "\"%start\"", "\"%token-table\"",
> |   "\"%verbose\"", "\"%yacc\"", "\"{...}\"", "\"%?{...}\"",
> |   "\"[identifier]\"", "\"character literal\"", "\":\"", "\"epilogue\"",
> |   "\"=\"", "\"identifier\"", "\"identifier:\"", "\"%%\"", "\"|\"",
> |   "\"%{...%}\"", "\";\"", "\"<tag>\"", "\"<*>\"", "\"<>\"",
> "\"integer\"",
> |   "\"%param\"", "\"%union\"", "\"%empty\"", "$accept", "input",
> |   "prologue_declarations", "prologue_declaration", "$@1", "params",
> |   "grammar_declaration", "code_props_type", "union_name",
> |   "symbol_declaration", "$@2", "$@3", "precedence_declarator", "tag.opt",
> |   "generic_symlist", "generic_symlist_item", "tag", "nterm_decls",
> |   "token_decls", "token_decl.1", "token_decl", "int.opt",
> |   "token_decls_for_prec", "token_decl_for_prec.1", "token_decl_for_prec",
> |   "symbol_decls", "symbol_decl.1", "grammar",
> |   "rules_or_grammar_declaration", "rules", "$@4", "rhses.1", "rhs",
> |   "named_ref.opt", "variable", "value", "id", "id_colon", "symbol",
> |   "string_as_id", "string_as_id.opt", "epilogue.opt", YY_NULLPTR
> | };
>
> This double-escaping wrecks non-ASCII string literals, such as
> mathematical symbols written in UTF-8.  It also is a problem when you
> want to translate your symbols.
>
> About one year ago I made a first attempt to address this:
> https://lists.gnu.org/r/bison-patches/2018-12/msg00088.html.  That
> series of patches completely changed the semantics of yytname (by
> removing this double-quotation).  I was quite happy with the result (I
> liked a lot the fact that there was just one implementation, not a
> host of alternatives, as that's the kind of things that make Bison
> hard to maintain).  I thought we were done, but there were a few
> problems.
>
> One severe issue brought to my attention by Rici Lake (unfortunately
> privately, although he had written a very nice and detailed mail with
> all the details) is that this would break several existing parsers
> that expect yytname to be this way.  For instance he pointed to
>
>
> https://git.gnupg.org/cgi-bin/gitweb.cgi?p=libksba.git;a=blob;f=src/asn1-parse.y;h=5bff15cd8db64786f7c9e2ef000aeddd583cfdc0;hb=HEAD#l856
>
> currently not responding, but the code is:
>
> | for (k = 0; k < YYNTOKENS; k++)
> |   {
> |     if (yytname[k] && yytname[k][0] == '\"'
> |         && !strncmp (yytname[k] + 1, string, len)
> |         && yytname[k][len + 1] == '\"' && !yytname[k][len + 2])
> |       return yytoknum[k];
> |   }
>
> While I was okay that my changes would modify some error messages of
> some parsers, I did not want to break parsers.  Note that the core
> problem here is that some people use yytname to write their scanners.
> In other words, they use the string literal not as a means to
> customize the error messages, but to encode in some way their scanner.
>
> We cannot satisfy both needs (better error messages and helping the
> definition of the scanner) at the same time.  I fully support the use
> of string literals for better error messages, and will not help to use
> them for the scanner.  I will pay attention not to break existing
> code, but I will not make any efforts to make it easier, nor to be
> portable across skeletons (in the future though, Bison will help these
> users, but by a totally different means).
>
> I looked at how people use yytnamerr on GitHub (see at the end of this
> message) and I would summarize that:
>
> - some users "fight" the double quotation there
>
> - some users want to translate the token names
>
> - some users want to customize their error messages and try to
>   shoehorn that into yytnamerr.  The example of PHP is particularly
>   enlightening
>
> I think we can satisfy everybody by offering alternatives to "%define
> parse.error verbose".
>
> With "%define parse.error rich" (for lack of a better idea,
> suggestions most welcome), you get something equivalent to 'verbose'
> but (i) there is no double quotation, (ii) there is support for symbol
> translation (as in
> https://lists.gnu.org/r/bison-patches/2018-12/msg00088.html).  This
> would _not_ support yytnamerr.
>
> With "%define parse.error custom", the user must define the
> yyreport_syntax_error function, which is fully responsible to emit the
> syntax error message.  In the branch yyreport_syntax_error (see
> https://github.com/akimd/bison/pull/16), the example looks like this:
>
> | int
> | yyreport_syntax_error (const yysyntax_error_context_t *ctx)
> | {
> |   enum { YYERROR_VERBOSE_ARGS_MAXIMUM = 10 };
> |   /* Arguments of yyformat: reported tokens (one for the "unexpected",
> |      one per "expected"). */
> |   char const *arg[YYERROR_VERBOSE_ARGS_MAXIMUM];
> |   int n = yysyntax_error_arguments (ctx, arg, sizeof arg / sizeof *arg);
> |   if (n == -2)
> |     return 2;
> |   fprintf (stderr, "SYNTAX ERROR on token [%s]", arg[0]);
> |   if (1 < n)
> |     {
> |       fprintf (stderr, " (expected:");
> |       for (int i = 1; i < n; ++i)
> |         fprintf (stderr, " [%s]", arg[i]);
> |       fprintf (stderr, ")");
> |     }
> |   fprintf (stderr, "\n");
> |   return 0;
> | }
>
> Again, this would not use yytnamerr; it's up to the user to implement
> and call whatever transformation she wants on the error message.  And
> there would be no double quotation in yytname.
>
> Chistian made a strong case in favor of what he called "push" API
> rather than pull:
>
> May 11th 2019, Christian Schoenebeck <address@hidden>
> (https://lists.gnu.org/r/help-bison/2019-05/msg00041.html):
>
> > My old concerns still apply, that is I still find a push design
> > (that is you call Bison API functions you need explicitly in your
> > yyerror() implementation to get just the information you need for
> > the error message) more appropriate, flexible, powerful and API
> > stable than a pull design (where you would get a predefined list of
> > information passed to your yyerror_whatever() implementation; like
> > eat it or leave it).
> >
> > The push design has various advantages. E.g. Bison can easily be
> > extended with new API functions if somebody needs another
> > information, without breaking the previous API and people's existing
> > yyerror() implementations. No matter what kind of arguments Bison
> > would pass to that suggested yyerror_*() function today, I am sure
> > later on people would ask about different information to be provided
> > by Bison as well, or in a different way. Some of them might even be
> > exotic from your point of view, but legitimate for certain use
> > cases, some might become replaced & deprecated, etc. Plus developers
> > could duplicate and modify the parser state without touching the
> > current state to see potential future information, etc.
>
> I think he is right, hence the call to yysyntax_error_arguments which
> returns the list of expected/unexpected tokens.
>
> I can't make up my mind on whether returning the list of expected
> tokens as strings (as exemplified above), or simply as their symbol
> numbers.  Symbol numbers are more efficient, yet they are the
> *internal* symbol numbers, not the ones the user is exposed to.
>
> I'm not sure I really want the unexpected token to be in the same
> array.  Maybe some day we would like to have $undefined have a string
> semantic value, where we would write the unexpected string:
>
> $ echo "1+&" | ./_build/g9d/examples/c/calc/calc
> SYNTAX ERROR on string [&] (expected: ["number"] ['('])
>
> would look better than
>
> $ echo "1+&" | ./_build/g9d/examples/c/calc/calc
> SYNTAX ERROR on token [$undefined] (expected: ["number"] ['('])
>
> If you are familiar with the current implementation of 'verbose' error
> reporting in yacc.c, you'll see that you cannot emulate it with the
> custom API I propose.  That's because the 'verbose' implementation
> goes into a lot of troubles so that the error message is built and
> stored in yyparse's stack frame (using alloca).  That requires a two
> step calling convention, where first one would first ask "how much
> space do I need for the error message" and a last one is "I allocated
> this much space here, please strcpy the error message here".  This is
> too complex, I don't think we should support such a complex protocol.
>
>
>
> I also plan to get rid of YYERROR_VERBOSE.  Juggling with CPP and M4
> at the same time is a nightmare.  YYERROR_VERBOSE was deprecated
> looong ago (Bison 2.6, 2012-07-19, claims that's deprecated since
> 1.875
> https://git.savannah.gnu.org/cgit/bison.git/commit/NEWS?id=258cddbc3).
>
>
> I would really appreciate comments on this proposal.  Sorry for making
> it so long.
>
> Cheers!
>
>
>
>
> # hayate891/linuxsampler -- Parser for NKSP real-time instrument script
> language
>
> They have single quotes in their string literals, yet they don't want
> the outer quotes (which are not removed by the default yytnamerr if
> there are single quotes).  IOW, would not be needed with my changes.
>
>
> https://github.com/hayate891/linuxsampler/blob/831838989ea8066d3875057e84e7585a1fec4160/linuxsampler/trunk/src/scriptvm/parser.y#L806
> ```
> /// Custom implementation of yytnamerr() to ensure quotation is always
> stripped from token names before printing them to error messages.
> int InstrScript_tnamerr(char* yyres, const char* yystr) {
>   if (*yystr == '"') {
>       int yyn = 0;
>       char const *yyp = yystr;
>       for (;;)
>         switch (*++yyp)
>           {
> /*
>           case '\'':
>           case ',':
>             goto do_not_strip_quotes;
>           case '\\':
>             if (*++yyp != '\\')
>               goto do_not_strip_quotes;
> */
>             /* Fall through.  */
>           default:
>             if (yyres)
>               yyres[yyn] = *yyp;
>             yyn++;
>             break;
>
>           case '"':
>             if (yyres)
>               yyres[yyn] = '\0';
>             return yyn;
>           }
> /*
>     do_not_strip_quotes: ;
> */
>     }
>
>   if (! yyres)
>     return (int) yystrlen (yystr);
>
>   return int( yystpcpy (yyres, yystr) - yyres );
> }
> ```
>
>
> # Ortelius maps your microservice configurations with their relationships
> to the application that use them
>
>
> https://github.com/ortelius/ortelius/blob/3c23d26b34d88b5e2398d341b5c9a2a9cd00a525/dmengine/dmapi/dm.y#L342
>
> Display literal tokens with quotes around them (e.g., as [unexpected
> "action"], not as [unexpected "\"action\""].
>
> Would not be needed with straight display.
>
> ```
> %token <bval> T_BOOL    "boolean value"
> %token <ival> NUM       "integer value"
> %token <str> STR STR2   "string literal"
> %token <str> IDENT      "identifier"
> %token <str> DOLIDENT   "$identifier"
> %token T_ACTION         "\"action\""
> %token T_BREAK          "\"break\""
> %token T_CASE           "\"case\""
> %token T_CATCH          "\"catch\""
> ```
>
> ```
> case '\\':
>         ++yyp;
>         /* RHT mod - allow double-quotes to be escaped */
>         if((*yyp != '\\') && (*yyp != '"')) {
>                 goto do_not_strip_quotes;
>         }
> ```
>
>
> # PHP
>
>
> https://github.com/php/php-src/blob/e1559b51cdde65400434653c55790279a15475d5/Zend/zend_language_parser.y#L1320
>
> ```
> %token <ast> T_LNUMBER   "integer number (T_LNUMBER)"
> %token <ast> T_DNUMBER   "floating-point number (T_DNUMBER)"
> %token <ast> T_STRING    "identifier (T_STRING)"
> %token <ast> T_VARIABLE  "variable (T_VARIABLE)"
> %token <ast> T_INLINE_HTML
> %token <ast> T_ENCAPSED_AND_WHITESPACE  "quoted-string and whitespace
> (T_ENCAPSED_AND_WHITESPACE)"
> %token <ast> T_CONSTANT_ENCAPSED_STRING "quoted-string
> (T_CONSTANT_ENCAPSED_STRING)"
> %token <ast> T_STRING_VARNAME "variable name (T_STRING_VARNAME)"
> %token <ast> T_NUM_STRING "number (T_NUM_STRING)"
> ```
>
> They maintain state between the different calls to yytname to know
> whether its the unexpected token, or one of the expected.  To this end
> they use `CG(parse_error)`, changed in yytnamerr.
>
>
> https://github.com/php/php-src/blob/2dd2dcaf9c8bcdda9c687660da887c5fddeb7448/Zend/zend_globals_macros.h#L34
> ```
> # define CG(v) (compiler_globals.v)
> ```
>
> ```
> /* Copy to YYRES the contents of YYSTR after stripping away unnecessary
>    quotes and backslashes, so that it's suitable for yyerror.  The
>    heuristic is that double-quoting is unnecessary unless the string
>    contains an apostrophe, a comma, or backslash (other than
>    backslash-backslash).  YYSTR is taken from yytname.  If YYRES is
>    null, do not copy; instead, return the length of what the result
>    would have been.  */
> static YYSIZE_T zend_yytnamerr(char *yyres, const char *yystr)
> {
>         /* CG(parse_error) states:
>          * 0 => yyres = NULL, yystr is the unexpected token
>          * 1 => yyres = NULL, yystr is one of the expected tokens
>          * 2 => yyres != NULL, yystr is the unexpected token
>          * 3 => yyres != NULL, yystr is one of the expected tokens
>          */
>         if (yyres && CG(parse_error) < 2) {
>                 CG(parse_error) = 2;
>         }
>
>         if (CG(parse_error) % 2 == 0) {
>                 /* The unexpected token */
>                 char buffer[120];
>                 const unsigned char *end, *str, *tok1 = NULL, *tok2 = NULL;
>                 unsigned int len = 0, toklen = 0, yystr_len;
>
>                 CG(parse_error)++;
>
>                 if (LANG_SCNG(yy_text)[0] == 0 &&
>                         LANG_SCNG(yy_leng) == 1 &&
>                         strcmp(yystr, "\"end of file\"") == 0) {
>                         if (yyres) {
>                                 yystpcpy(yyres, "end of file");
>                         }
>                         return sizeof("end of file")-1;
>                 }
>
>                 str = LANG_SCNG(yy_text);
>                 end = memchr(str, '\n', LANG_SCNG(yy_leng));
>                 yystr_len = (unsigned int)yystrlen(yystr);
>
>                 if ((tok1 = memchr(yystr, '(', yystr_len)) != NULL
>                         && (tok2 = zend_memrchr(yystr, ')', yystr_len)) !=
> NULL) {
>                         toklen = (tok2 - tok1) + 1;
>                 } else {
>                         tok1 = tok2 = NULL;
>                         toklen = 0;
>                 }
>
>                 if (end == NULL) {
>                         len = LANG_SCNG(yy_leng) > 30 ? 30 :
> LANG_SCNG(yy_leng);
>                 } else {
>                         len = (end - str) > 30 ? 30 : (end - str);
>                 }
>                 if (yyres) {
>                         if (toklen) {
>                                 snprintf(buffer, sizeof(buffer), "'%.*s'
> %.*s", len, str, toklen, tok1);
>                         } else {
>                                 snprintf(buffer, sizeof(buffer), "'%.*s'",
> len, str);
>                         }
>                         yystpcpy(yyres, buffer);
>                 }
>                 return len + (toklen ? toklen + 1 : 0) + 2;
>         }
>
>         /* One of the expected tokens */
>         if (!yyres) {
>                 return yystrlen(yystr) - (*yystr == '"' ? 2 : 0);
>         }
>
>         if (*yystr == '"') {
>                 YYSIZE_T yyn = 0;
>                 const char *yyp = yystr;
>
>                 for (; *++yyp != '"'; ++yyn) {
>                         yyres[yyn] = *yyp;
>                 }
>                 yyres[yyn] = '\0';
>                 return yyn;
>         }
>         yystpcpy(yyres, yystr);
>         return strlen(yystr);
> }
> ```
>
> GC(parse_error) is reset when the error message is reported:
>
>
> https://github.com/php/php-src/blob/a40a69fdd058cdcb7da5d4527ea6c7dd261417b7/Zend/zend.c#L1115
> ```
> ZEND_COLD void zenderror(const char *error) /* {{{ */
> {
>         CG(parse_error) = 0;
>
>         if (EG(exception)) {
>                 /* An exception was thrown in the lexer, don't throw
> another in the parser. */
>                 return;
>         }
>
>         zend_throw_exception(zend_ce_parse_error, error, 0);
> }
> /* }}} */
>
> ```
>
>
> # PolishReadabilityChecker
>
> They want i18n.  I don't see where the other tokens are defined though.
>
>
> https://github.com/stasiekz/PolishReadabilityChecker/blob/86ec1caec23352ac170dc33e12de02a6ce47c044/third_party/corpus2/poliqarp-library/sakura/parser.y#L55
>
> ```
> static size_t yytnamerr(char *yyres, const char *yystr)
> {
>    const char *translation = _(yystr);
>    size_t length = strlen(translation);
>    if (yyres != NULL)
>       strcpy(yyres, translation);
>    return length;
> }
> #if 0
>    M_("$undefined");
>    M_("$end");
> #endif
> ```
>
>
>
> # Daruma BASIC for Raspberry Pi and Mac OS X
>
> They want I18n.
>
>
> https://github.com/toshinagata/darumabasic/blob/3979d68493eceafcd4cbc6410d7d52fbe6b8c9ba/common/transmessage.c
>
> ```
> const char *err_msg[] = {
>         /* BS_M_ONLY_ONE_DO_COND */
>         "only one WHILE/UNTIL condition may appear with DO or LOOP
> keyword",
>
>         /* BS_M_UNDEF_SYMBOL */
>         "undefined symbol %s",
> ```
>
> ```
> const char *translated_err_msg[] = {
>         /* BS_M_ONLY_ONE_DO_COND */
>         "DO...LOOPのWHILE/UNTIL条件は一つまでです",
>
>         /* BS_M_UNDEF_SYMBOL */
>         "%sは未定義です",
>
> ```
> ```
> unsigned int
> my_yytnamerr(char *yyres, const char *yystr)
> {
>         /*  This is a patch to remove 'BS_' from the token name  */
>         if (strncmp(yystr, "BS_", 3) == 0)
>                 yystr += 3;
>         /*  End patch  */
> ```
>
>
>
> https://github.com/toshinagata/darumabasic/blob/3979d68493eceafcd4cbc6410d7d52fbe6b8c9ba/common/grammar.y#L24
> ```
> /*  Reserved words  */
> %token BS_IF BS_THEN BS_ELSE BS_ELSEIF BS_ENDIF
> %token BS_DO BS_WHILE BS_UNTIL BS_LOOP
> ```
>
>
> # Intel SPMD Program Compiler
>
> https://github.com/ispc/ispc/blob/0198f9f8e3b64716e65e56a3861972c71c6c6019/src/parse.yy#L2092
>
> yytnamerr starts with an addition to be able to rename some of the
> tokens.  Otherwise, back to the default implementation.
>
> This is C++, yet they use yacc.c.
>
> ```
> static int
> lYYTNameErr (char *yyres, const char *yystr)
> {
>   extern std::map<std::string, std::string> tokenNameRemap;
>   Assert(tokenNameRemap.size() > 0);
>   if (tokenNameRemap.find(yystr) != tokenNameRemap.end()) {
>       std::string n = tokenNameRemap[yystr];
>       if (yyres == NULL)
>           return n.size();
>       else
>           return yystpcpy(yyres, n.c_str()) - yyres;
>   }
>
>   if (*yystr == '"')
>     {
>       YYSIZE_T yyn = 0;
>       char const *yyp = yystr;
>
>       for (;;)
>         switch (*++yyp)
>           {
>           case '\'':
>           case ',':
>             goto do_not_strip_quotes;
>
>           case '\\':
>             if (*++yyp != '\\')
>               goto do_not_strip_quotes;
>             /* Fall through.  */
>           default:
>             if (yyres)
>               yyres[yyn] = *yyp;
>             yyn++;
>             break;
>
>           case '"':
>             if (yyres)
>               yyres[yyn] = '\0';
>             return yyn;
>           }
>     do_not_strip_quotes: ;
>     }
>
>   if (! yyres)
>     return yystrlen (yystr);
>
>   return yystpcpy (yyres, yystr) - yyres;
> }
> ```
>
>
>
> https://github.com/ispc/ispc/blob/017e80e7c0a4187b44e37aed3c235520fccaf088/src/lex.ll#L98
> ```
> void ParserInit() {
>     tokenToName[TOKEN_ASSERT] = "assert";
>     tokenToName[TOKEN_BOOL] = "bool";
>     tokenToName[TOKEN_BREAK] = "break";
>     tokenToName[TOKEN_CASE] = "case";
>     tokenToName[TOKEN_CDO] = "cdo";
>     tokenToName[TOKEN_CFOR] = "cfor";
>     tokenToName[TOKEN_CIF] = "cif";
> ```
>
>
> https://github.com/ispc/ispc/blob/017e80e7c0a4187b44e37aed3c235520fccaf088/src/lex.ll#L218
> ```
>     tokenNameRemap["TOKEN_ASSERT"] = "\'assert\'";
>     tokenNameRemap["TOKEN_BOOL"] = "\'bool\'";
>     tokenNameRemap["TOKEN_BREAK"] = "\'break\'";
>     tokenNameRemap["TOKEN_CASE"] = "\'case\'";
>     tokenNameRemap["TOKEN_CDO"] = "\'cdo\'";
>     tokenNameRemap["TOKEN_CFOR"] = "\'cfor\'";
> ```
>
>
> # Ruby
>
>
> https://github.com/ruby/ruby/blob/0c2d731ef210c9121e2a97cc5b0d7594a80389f3/parse.y#L12687
>
>
> ```
> /*
>  * strip enclosing double-quotes, same as the default yytnamerr except
>  * for that single-quotes matching back-quotes do not stop stripping.
>  *
>  *  "\"`class' keyword\"" => "`class' keyword"
>  */
> RUBY_FUNC_EXPORTED size_t
> rb_yytnamerr(struct parser_params *p, char *yyres, const char *yystr)
> {
>     YYUSE(p);
>     if (*yystr == '"') {
>         size_t yyn = 0, bquote = 0;
>         const char *yyp = yystr;
>
> ```
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]