bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RFC: renaming the symbol "types" as "kinds"


From: Akim Demaille
Subject: RFC: renaming the symbol "types" as "kinds"
Date: Sat, 4 Apr 2020 14:48:31 +0200

Long ago Bison introduced the enum yytokentype:

  enum yytokentype
  {
    GRAM_EOF = 0,
    STRING = 3,
    TSTRING = 4,
    PERCENT_TOKEN = 5,

to replace the old #defines:

#define GRAM_EOF 0
#define STRING 3
#define TSTRING 4
#define PERCENT_TOKEN 5

The name of the enum, "yytokentype" was built from "token type"
which is used in many places in the documentation, for instance:

> The Bison representation for a terminal symbol is also called a
> “token type”.  Token types as well can be represented as C-like
> identifiers.

or

> The return value of the lexical analyzer function is a numeric code
> which represents a token type.  

or again

> The basic way to declare a token type name (terminal symbol) is as
> follows:
> 
>     %token NAME
> 
>   Bison will convert this into a definition in the parser, so that the
> function ‘yylex’ (if it is in this file) can use the name NAME to stand
> for this token type’s code.


The parser actually deals with two numbering schemes for symbols: the
"external token number" is the code returned by yylex (that's yytokentype).
Because we leave room for people who want to return chars from yylex,
the non-chars tokens are actually numbered starting at 256.  It would
leave many holes in the parser tables, so "external token numbers" are
"translated" (by the "yytranslate" function) into the "internal symbol
number".

So far internal symbol numbers were not user-facing, and they were plain
ints.  In Bison 3.6, they will be presented to the users when they forge
their custom error messages.  I introduced yysymbol_type_t to this
end:

> enum yysymbol_type_t
> {
>   YYSYMBOL_YYEMPTY = -2,
>   YYSYMBOL_YYEOF = 0,
>   YYSYMBOL_YYERROR = 1,
>   YYSYMBOL_YYUNDEF = 2,
>   YYSYMBOL_STRING = 3,
>   YYSYMBOL_TSTRING = 4,
>   YYSYMBOL_PERCENT_TOKEN = 5,
>   [...]
> };
> typedef enum yysymbol_type_t yysymbol_type_t;

That's been done in all the skeletons [1].  In C++, D, and Java,
yysymbol_type_t becomes yysymbol_type_type, SymbolType and SymbolType.

I feel we are introducing a confusion between "type" as in "typing"
and "type" as in "different sorts of symbols".  And yysymbol_type_type
looks weird, and should not be confused with symbol_type which denotes
a "full" symbol (symbol type, semantical value and location).

So I think we should replace all our uses of "token type" (doc and
code, in a backward compatible manner of course) with "token kind".
Likewise, yysymbol_type_t would become yysymbol_kind_t.

What do people think about this change?  Would you have another suggestion?
"Token kind" appears (once...) in the POSIX Yacc documentation
(https://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html):

        The yylex() function is an integer-valued function that
        returns a token number representing the kind of token read.

Thanks in advance.


[1]
C/C++: https://lists.gnu.org/r/bison-patches/2020-04/msg00002.html
Java: https://lists.gnu.org/r/bison-patches/2020-04/msg00029.html
D: https://lists.gnu.org/archive/html/bison-patches/2020-04/msg00030.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]