help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Syntax error messages


From: Hans Åberg
Subject: Re: Syntax error messages
Date: Fri, 1 Oct 2021 09:37:52 +0200

> On 28 Sep 2021, at 14:10, Christian Schoenebeck <schoenebeck@crudebyte.com> 
> wrote:
> 
> On Montag, 27. September 2021 22:07:33 CEST Hans Åberg wrote:
>>>> In order to generate better syntax error messages writing out the input
>>>> line with the error and a line with a marker underneath, I thought of
>>>> checking how Bison does it, but I could not find the place in its
>>>> sources. —Specifically, a suggestion is to tweak YY_INPUT in the lexer
>>>> to buffer one input line at a time, but Bison does not seem to do that.> 
>>> No, I keep track of the byte offset in the file, and print from the file,
>>> which I reopen to quote the source.
>> OK. I thought of this method, but then it does not work with streams.
> 
> In the past at least, builtin location support did not work well for me. So 
> I'm usually overriding location data type and behaviour with custom type 
> declaration, plus implementation on lexer side.
> 
> I also prefer this data type presentation:
> 
> // custom Bison location type to support raw byte positions
> struct _YYLTYPE {
>    int first_line;
>    int first_column;
>    int last_line;
>    int last_column;
>    int first_byte;
>    int length_bytes;
> };
> #define YYLTYPE _YYLTYPE
> #define YYLTYPE_IS_DECLARED 1
> 
> // override Bison's default location passing to support raw byte positions
> #define YYLLOC_DEFAULT(Cur, Rhs, N)                         \
> do                                                          \
>  if (N)                                                    \
>    {                                                       \
>      (Cur).first_line   = YYRHSLOC(Rhs, 1).first_line;     \
>      (Cur).first_column = YYRHSLOC(Rhs, 1).first_column;   \
>      (Cur).last_line    = YYRHSLOC(Rhs, N).last_line;      \
>      (Cur).last_column  = YYRHSLOC(Rhs, N).last_column;    \
>      (Cur).first_byte   = YYRHSLOC(Rhs, 1).first_byte;     \
>      (Cur).length_bytes = (YYRHSLOC(Rhs, N).first_byte  -  \
>                            YYRHSLOC(Rhs, 1).first_byte) +  \
>                            YYRHSLOC(Rhs, N).length_bytes;  \
>    }                                                       \
>  else                                                      \
>    {                                                       \
>      (Cur).first_line   = (Cur).last_line   =              \
>        YYRHSLOC(Rhs, 0).last_line;                         \
>      (Cur).first_column = (Cur).last_column =              \
>        YYRHSLOC(Rhs, 0).last_column;                       \
>      (Cur).first_byte   = YYRHSLOC(Rhs, 0).first_byte;     \
>      (Cur).length_bytes = YYRHSLOC(Rhs, 0).length_bytes;   \
>    }                                                       \
> while (0)
> 
> Because sometimes you need high level column & line span, and sometimes you 
> rather need low level raw byte position & byte length in the input data 
> stream.

For the purpose of writing out the line in the error messages, this method 
(using C++) did not work out well, because I have two parsers, one for the 
language and one for directives, and it turns out to be difficult to pass the 
location information back to the top parser.

So instead, in addition to the input stream stack, I added two, for the current 
stream position, and the current stream line position. Because of the lexer 
buffering, they are computed in the lexer. These are properties attached to the 
input streams then, not the parser locations.

In the Bison type, I use line number and for columns the number of UTF-8 
characters. An ASCII caret marking the error is surprisingly accurate even in 
the presence of non-ASCII characters. But perhaps one should have a method to 
mark it on the line itself, not underneath.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]